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(57) Incorporation of certain amino acid analogs into 
polypeptides produced by cells which do not ordinarily 
provide polypeptides containing such amino acid ana- 
logs is accomplished by subjecting the cells to growth 
media containing such amino acid analogs. The degree 
of incorporation can be regulated by adjusting the con- 
centration of amino acid analogs in the media and/or by 
adjusting osmolality of the media. Such incorporation al- 
lows the chemical and physical characteristics of 



polypeptides to be altered and studied. In addition, nu- 
cleic acid and corresponding proteins including a do- 
main from a physiologically active peptide and a domain 
from an extracellular matrix protein which is capable of 
providing a self-aggregate are provided. Human extra- 
cellular matrix proteins capable of providing a self-ag- 
gregate collagen are provided which are produced by 
prokaryotic cells. Preferred codon usage is employed to 
produce extracellular matrix proteins in prokaryotics. 
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Description 
BACKGROUND 
5 1. Technical Field 

[0001] Engineered polypeptides and chimeric polypeptides having incorporated amino acids which enhance or oth- 
erwise modify properties of such polypeptides. 

io 2. Description of Related Art 

[0002] Genetic engineering allows polypeptide production to be transferred from one organism to another. In doing 
so, a portion of the production apparatus indigenous to an original host is transplanted into a recipient. Frequently, the 
original host has evolved certain unique processing pathways in association with polypeptide production which are not 

»5 contained in or transferred to the recipient. For example, it is well known that mammalian cells incorporate a complex 
set of post-translational enzyme systems which impart unique characteristics to protein products of the systems. When 
a gene encoding a protein normally produced by mammalian cells is transferred into a bacterial or yeast cell, the protein 
may not be subjected to such post translational modification and the protein may not function as originally intended. 
[0003] Normally, the process of polypeptide or protein synthesis in living cells involves transcription of DNA into RNA 

20 and translation of RNA into protein. Three forms of RNA are involved in protein synthesis: messenger RNA (mRNA) 
carries genetic information to ribosomes made of ribosomal RNA (rRNA) while transfer RNA (tRNA) links to free amino 
acids in the cell pool. Amino aciaVtRNA complexes line up next to codons of mRNA, with actual recognition and binding 
being mediated by tRNA. Cells can contain up to twenty amino acids which are combined and incorporated in sequences 
of varying permutations into proteins. Each amino acid is distinguished from the other nineteen amino acids and charged 

25 to tRNA by enzymes known as aminoacyl-tRNA synthetases. As a general rule, amino acid/tRNA complexes are quite 
specific and normally only a molecule with an exact stereochemical configuration is acted upon by a particular ami- 
noacyl-tRNA synthetase. 

[0004] In many living cells some amino acids are taken up from the surrounding environment and some are synthe- 
sized within the cell from precursors, which in turn have been assimilated from outside the cell. In certain instances, 
30 a cell is auxotrophic, i.e., it requires a specific growth substance beyond the minimum required for normal metabolism 
and reproduction which it must Obtain from the surrounding environment. Some auxotrophs depend upon the external 
environment to supply certain amino acids. This feature allows certain amino acid analogs to be incorporated into 
proteins produced by auxotrophs by taking advantage of relatively rare exceptions to the above rule regarding stere- 
ochemical specificity of aminoacyl-tRNA synthetases. For example, proline is such an exception, i.e., the amino acid 
- 35 activating enzymes responsible for the synthesis of prolyt-tRNA complex are not as specific as others. As a conse- 
quence certain proline analogs have been incorporated into bacterial, plant, and animal cell systems. See Tan et al., 
Proline Analogues Inhibit Human Skin Fibroblast Growth and Collagen Production in Culture, Journal of Investigative 
Dermatology, 80:261-267(1983). 

[0005] A method of incorporating unnatural amino acids into proteins is described, e.g., in Noren et al., A General 
40 Method For Site-Specific Incorporation of Unnatural Amino Acids Into Proteins, Science, Vol. 244, pp. 182-188 (1989) 
wherein chemically acylated suppressor tRNA is used to insert an amino acid in response to a stop codon substituted 
for the codon encoding residue of interest. See also, Dougherty et al.. Synthesis of a Genetically Engineered Repetitive 
Polypeptide Containing Periodic Selenomethionine Residues, Macromolecules, Vol. 26, No. 7, pp. 1779-1781 (1993), 
which describes subjecting an E. coli methionine auxotroph to selenomethionine containing medium and postulates 
« on the basis of experimental data that selenomethionine may completely replace methionine in all proteins produced 
by the cell. 

[0006] c/s-Hydroxy-L-proline has been used to study its effects on collagen by incorporation into eukaryotic cells 
such as cultured normal skin fibroblasts (see Tan et al., supra) and tendon cells from chick embryos (see e.g., Uitto et 
al.. Procollagen Polypeptides Containing c/s-4-Hydroxy-L-proline are Overglycosylated and Secreted as Nonhelical 

50 Pro-^Chains, Archives of Biochemistry and Biophysics, 185:1:214-221(1978)). However, investigators found that 
frans-4-hydroxyproline would not link with proline specific tRNA of prokaryotic E. coli. See Papas et al., Analysis of the 
Amino Acid Binding to the Proline Transfer Ribonucleic Acid Synthetase of Escherichia coli, Journal of Biological Chem- 
istry, 245:7:1588-1595(1970). Another unsuccessful attempt to incorporate <ra/?s-4-hydroxyproline into prokaryotes is 
described in Deming et al., In Vitro Incorporation of Proline Analogs into Artificial Proteins, Poly. Mater. Sci. Engin. 

55 Proceed., Vol. 71, p. 673-674 (1994). Deming et al. report surveying the potential for incorporation of certain proline 
analogs, i.e., L-azetidine-2-carboxylic acid, L-7-thiaproline, 3,4-dehydroproline and L-frar»s-4-hydroxyproline into arti- 
ficial proteins expressed in E. coli cells. Only L-azetidine-2-carboxylic acid, L^f-thiaproline and 3,4 dehydroproline are 
.reported as being incorporated into' proteins in E. coli cells in vivo. 
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[0007] Extracellular matrix proteins CEMPs") are found in spaces around or near cells of multicellular organisms and 
are typically fibrous proteins of two functional types: mainly structural, e.g., collagen and elastin, and mainly adhesive, 
e.g., fibronectin and laminin. Collagens are a family of fibrous proteins typically secreted by connective tissue cells. 
Twenty distinct collagen chains have been identified which assemble to form a total of about ten different collagen 
5 molecules. A general discussion of collagen is provided by Alberts, et al., The Cell, Garland Publishing, pp. 802-823 
(1989), incorporated herein by reference. Other fibrous or filamentous proteins include Type I IF proteins, e.g., keratins; 
Type II IF proteins, e.g., vimentin, desmin and glial fibrillary acidic protein; Type III IF proteins, e.g., neurofilament 
proteins; and Type IV IF proteins, e.g., nuclear laminins. 

[0008] Type I collagen is the most abundant form of the fibrillar, interstitial collagens and is the main component of 
10 the extracellular matrix. Collagen monomers consist of about 1000 amino acid residues in a repeating array of Gly-X- 
Y triplets. Approximately 35% of the X and Y positions are occupied by proline and trans 4-hydroxyproline. Collagen 
monomers associate into triple helices which consist of one a2 and two a1 chains. The triple helices associate into 
fibrils which are oriented into tight bundles. The bundles of collagen fibrils are further organized to form the scaffold 
for extracellular matrix. 

15 [0009] In mammalian cells, post-translational modification of collagen contributes to its ultimate chemical and phys- 
ical properties and includes proteolytic digestion of pro-regions, hydroxylation of lysine and proline, and glycosylation 
of hydroxylated lysine. The proteolytic digestion of collagen involves the cleavage of pro regions from the N and C 
termini. It is known that hydroxylation of proline is essential for the mechanical properties of collagen. Collagen with 
low levels of 4-hydroxyproline has poor mechanical properties, as highlighted by the sequelae associated with scurvy. 

20 4-hydroxyproline adds stability to the triple helix through hydrogen bonding and through restricting rotation about C-N 
bonds in the polypeptide backbone. In the absence of a stable structure, naturally occurring cellular enzymes contribute 
to degrading the collagen polypeptide. 

[0010] The structural attributes of Type I collagen along with its generally perceived biocompatability make it a de- 
sirable surgical implant material. Collagen is purified from bovine skin or tendon and used to fashion a variety of medical 
25 devices including hemostats, implantable gels, drug delivery vehicles and bone substitutes. However, when implanted 
into humans bovine collagen can cause acute and delayed immune responses. 

[0011] As a consequence, researchers have attempted to produce human recombinant collagen with all of its struc- 
tural attributes in commercial quantities through genetic engineering. Unfortunately, production of collagen by com- 
mercial mass producers of protein such as E. coli has not been successful. A major problem is the extensive post- 
30 translational modification of collagen by enzymes not present in E. coli. Failure of E. coli cells to provide proline hy- 
droxylation of unhydroxylated collagen proline prevents manufacture of structurally sound collagen in commercial quan- 
tities. 

[0012] Another problem in attempting to use E. coli to produce human collagen is that E. coli prefer particular codons 
in the production of polypeptides. Although the genetic code is identical in both prokaryotic and eukaryotic organisms, 

35 the particular codon (of the several possible for most amino acids) that is most commonly utilized can vary widely 
between prokaryotes and eukaryotes. See, Wada, K.-N., Y. Wada, F. Ishibashi, T. Gojobori and T. Ikemura. Nucleic 
Acids Res. 20, Supplement 2111-2118, 1992. Efficient expression of heterologous (e.g. mammalian) genes in prokary- 
otes such as E. coli can be adversely affected by the presence in the gene of codons infrequently used in E. coli and 
expression levels of the heterologous protein often rise when rare codons are replaced by more common ones. See, 

40 e.g., Williams, D.P., D. Regier, D. Akiyoshi, F. Genbauffe and J.R. Murphy. Nucleic Acids Res. 16: 10453-10467, 1988 
and H65g, J.-O., H. v. Bahr-Lindstr6m, H. JOrnvall and A. Holmgren. Gene. 43: 13-21, 1986. This phenomenon is 
thought to be related, at least in part, to the observation that a low frequency of occurrence of a particular codon 
correlates with a low cellular level of the transfer RNA for that codon. See, Ikemura, T.J. Mol. Biol. 1 58: 573-597, 1 982 
and Ikemura, T.J. Mol. Biol. 146: 1-21,1 981 . Thus, the cellular tRNA level may limit the rate of translation of the codon 

45 and therefore influence the overall translation rate of the fulHength protein. See, Ikemura, T.J. Mol. Biol. 146: 1-21, 
1981; Bonekamp. F. and F.K. Jensen. Nucleic Acids Res. 16: 3013-3024, 1988; Misra, R. and P. Reeves, Eur. J. 
Biochem. 152: 151-155, 1985; and Post, L.E., G.D. Strycharz, M. Nomura, H. Lewis and P.P. Lewis. Proc. Natl. Acad. 
Sci. U.SA 76: 1697-1701, 1979. In support of this hypothesis is the observation that the genes for abundant E. coli 
proteins generally exhibit bias towards commonly used codons that represent highly abundant tRNAs. See, Ikemura, 

50 T.J. Mol. Biol. 146: 1-21, 1981; Bonekamp, F. and F.K. Jensen. Nucleic Acids Res. 16: 3013-3024, 1988; Misra, R. and 
P. Reeves, Eur. J. Biochem. 152: 151-155, 1985; and Post, L.E., G.D. Strycharz, M. Nomura, H. Lewis and P.P. Lewis. 
Proc. Natl. Acad. Sci. U.S.A. 76: 1 697-1 701 , 1 979. In addition to codon frequency, the codon context (i.e. the surround- 
ing nucleotides) can also affect expression: 

[0013] Although it would appear that substituting preferred codons for rare codons could be expected to increase 
55 expression of heterologous proteins in host organisms, such is not the case. Indeed, "it has not been possible to 
formulate general and unambiguous rules to predict whether the content of low-usage codons in a specific gene might 
adversely affect the efficiency of its expression in E. coli. " See page 524 of S.C. Makrides (1996), Strategies for Achiev- 
ing High-Level Expression of Genes in Escherichia coli. Microbiological Reviews 60, 512-538. For example, in one 
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case, various gene fusions between yeast a factor and somatomedin C were made that differed only in coding se- 
quence. In these experiments, no correlation was found between codon bias and expression levels in £. coli. Ernst, 
J.F. and Kawashima, E. (1988), J. Biotechnology, 7, 1-10. In another instance, it was shown that despite the higher 
frequency of optimal codons in a synthetic B-globin gene compared to the native sequence, no difference was found 

5 in the protein expression from these two constructs when they were placed behind the T7 promoter. Hernan et al. 
(1 992), Biochemistry, 31 , 861 9-8628. Conversely, there are many examples of proteins with a relatively high percentage 
of rare codons that are well expressed in E. coli. A table listing some of these examples and a general discussion can 
be found in Makoff, A.J. et al. (1989), Nucleic Acids Research, 17, 10191-10202. In one case, introduction of non- 
optimal, rare arginine codons at the 3" end of a gene actually increased the yield of expressed protein. Gursky, Y.G. 

10 and Beabealashvilli, R.Sh. (1994), Gene 148, 15-21. 

[0014] Failure to provide post-translational modifications such as hydroxylation of proline and the presence in human 
collagen of rare codons for E. coli may be contributing to the difficulties encountered in the expression of human 
collagen genes in E. coli. 

15 SUMMARY 

[0015] A method of incorporating an amino acid analog into a polypeptide produced by a cell is provided which 
includes providing a cell selected from the group consisting of prokaryotic cell and eukaryotic cell, providing growth 
media containing at least one amino acid analog selected from the group consisting of fra/?s-4-hydroxyproline, 3-hy- 

20 droxyproline, c/s-4-fluoro-L-proline and combinations thereof and contacting the cell with the growth media wherein 
the at least one amino acid analog is assimilated into the cell and incorporated into at least one polypeptide. 
[0016] Also provided is a method of substituting an amino acid analog of an amino acid in a polypeptide produced 
by a cell selected from the group consisting of prokaryotic cell and eukaryotic cell, which includes providing a cell 
selected from the group consisting of prokaryotic cell and eukaryotic cell, providing growth media containing at least 

25 one amino acid analog selected from the group consisting of frans-4-hydroxyproline, 3-hydroxyproline, c/s-4-fiuoro-L- 
proline and combinations thereof and contacting the cell with the growth media wherein the at least one amino acid 
analog is assimilated into the cell and incorporated as a substitution for at least one naturally occurring amino acid in 
at least one polypeptide. 

[0017] A method of controlling the amount of an amino acid analog incorporated into a polypeptide is also provided 
30 which includes providing at least a first cell selected from the group consisting of prokaryotic cell and eukaryotic cell, 
providing a first growth media containing a first predetermined amount of at least one amino acid analog selected from 
the group consisting of frans-4-hydroxyproline, 3-hydroxyproline, c/s-4-fluoro-L-proline and combinations thereof and 
contacting the first cell with the first growth media wherein a first amount of amino acid analog is assimilated into the 
first cell and incorporated into at least one polypeptide. At least a second cell selected from the group consisting of 
35 prokaryotic cell and eukaryotic cell, is also provided along with a second growth media containing a second predeter- 
mined amount of an amino acid analog selected from the group consisting of frar»s-4-hydroxyproline, 3-hydroxyproline, 
c/s-4-fluoro-L-proline and combinations thereof and the at least second cell is contacted with the second growth media 
wherein a second amount of amino acid analog is assimilated into the second cell and incorporated into at least one 
polypeptide. 

40 [0018] Also provided is a method of increasing stability of a recombinant polypeptide produced by a cell which in- 
cludes providing a cell selected from the group consisting of prokaryotic cell and eukaryotic cell, and providing growth 
media containing an amino acid analog selected from the group consisting of <ra/7S-4-hydroxypro!ine, 3-hydroxyproline, . 
c/s-4-fluoro-L-proline and combinations thereof and contacting the cell with the growth media wherein the amino acid 
analog is assimilated into the cell and incorporated into a recombinant polypeptide, thereby stabilizing the polypeptide. 

45 [0019] A method of increasing uptake of an amino acid analog into a cell and causing formation of an amino acid 
analog/tRNA complex is also provided which includes providing a cell selected from the group consisting of prokaryotic 
cell and eukaryotic cell, providing hypertonic growth media containing amino acid analog selected from the group 
consisting of frans-4-hydroxyproline, 3-hydroxyproline, c/s-4-fluoro-L-proline and combinations thereof and contacting 
the cell with the hypertonic growth media wherein the amino acid analog is assimilated into the cell and incorporated 

50 into an amino acid analog/tRNA complex. In any of the other above methods, a hypertonic growth media can optionally 
be incorporated to increase uptake of an amino'acid analog into a cell. 

[0020] A composition is provided which includes a cell selected from the group consisting of prokaryotic cell and 
eukaryotic cell, and hypertonic media including an amino acid analog selected from the group consisting of trans- 
4-hydroxyproline, 3-hydroxyproline, c/s-4-fluoro-L-proline and combinations thereof. 
55 [0021] Also provided is a method of producing an Extracellular Matrix Protein (EMP) or a fragment thereof capable 
of providing a self-aggregate in a cell which does not ordinarily hydroxylate proline which includes providing a nucleic 
acid sequence encoding the EMP or fragment thereof which has been optimized for expression in the cell by substitution 
of codons preferred by the cell for naturally occurring codons not preferred by the cell, incorporating the nucleic acid 
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sequence into the cell, providing hypertonic growth media containing at least one amino acid selected from the group 
consisting of frans-4-hydroxyproline and 3-hydroxyproline, and contacting the cell with the growth media wherein the 
at least one amino acid is assimilated into the cell and incorporated into the EMP or fragment thereof. 
[0022] Nucleic acid encoding a chimeric protein is provided which includes a domain from a physiologically active 
5 peptide and a domain from an extracellular matrix protein {EMP) which is capable of providing a self-aggregate. The 
nucleic acid may be inserted into a cloning vector which can then be incorporated into a cell. 
[0023] Also provided is a chimeric protein including a domain from a physiologically active peptide and a domain 
from an extracellular matrix protein (EMP) which is capable of providing a self aggregate. 

[0024] Also provided is human collagen produced by a prokaryorjc cell, the human collagen being capable of pro- 
10 viding a self aggregate. 

[0025] Also provided is nucleic acid encoding a human Extracellular Matrix Protein (EMP) wherein the codon usage 
in the nucleic acid sequence reflects preferred codon usage in a prokaryotic cell. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 

[0026] Figure 1 is a plasmid map illustrating pMAL-c2. 

[0027] Figure 2 is a graphical representation of the concentration of intracellular hydroxyproline based upon con- 
centration of trans-4-hydroxyproline in growth culture over time. 

[0028] Figure 2A is a graphical representation of the concentration of intracellular hydroxyproline as a function of 
20 sodium chloride concentration. 

[0029] Figures 3A and 3B depict a DNA sequence encoding human Type 1 (a,) collagen (SEQ. ID. NO. 1). 
[0030] Figure 4 is a plasmid map illustrating pHuCol. 

[0031] Figure 5 depicts a DNA sequence encoding a fragment of human Type 1 (o^) collagen (SEQ. ID. NO.2:). 
[0032] Figure 6 is a plasmid map illustrating pHuCol-FI. 
25 [0033] Figure 7 depicts a DNA sequence encoding a collagen-like peptide wherein the region coding for gene col- 
lagen-like peptide is underlined (SEQ. ID. NO. 3). 

[0034] Figure 8 depicts an amino acid sequence of a collagen-like peptide (SEQ. ID. NO. 4). 
[0035] Figure 9 is a plasmid map illustrating pCLP. 

[0036] Figure 10 depicts a DNA sequence encoding mature bone morphogenic protein (SEQ. ID. NO. 5). 
30 [0037] Figure 11 is a plasmid map illustrating pCBC. 

[0038] Figure 1 2 is a graphical representation of the percent incorporation of proline and frans-4-hydroxyproline into 
maltose binding protein under various conditions. 

[0039] Figure 13 depicts a collagen I (a1)/BMP-2B chimeric amino acid sequence (SEQ. ID. NO. 6). 
[0040] Figure 14A-14C depicts a collagen I (a1)/BMP-2B chimeric nucleotide sequence (SEQ. ID. NO. 7). 
35 [0041] Figure 15 depicts a collagen I (alKTGF-p^mino acid sequence (SEQ. ID. NO. 8). 

[0042] Figure 16A-16C depict a collagen I (a1 J/TGF-P, nucleotide sequence (SEQ. ID. NO. 9). Lower case lettering 
indicates non-coding sequence. 

[0043] Figures 17A-17B depict a collagen I (a1)/decorin amino acid sequence (SEQ. ID. NO. 10). 
[0044] Figure 18 depicts a collagen I (<x1)/decorin peptide amino acid sequence (SEQ; ID. NO.11). 
to [0045] Figures 19A-19D depict a collagen I (cc1)/decorin nucleotide sequence (SEQ. ID. NO. 12). 

[0046] Figures 20A-20C depict a collagen/decorin peptide nucleotide sequence (SEQ. ID. NO. 1 3). Lower case let- 
tering indicates non-coding sequence. 

[0047] Figure 21 depicts a pMal cloning vector and polylinker cloning site. 

[0048] Figure 22 depicts a polylinker cloning site contained in the pMal cloning vector of Fig. 21 (SEQ. ID. NO. 14). 
45 [0049] Figure 23 depicts a pMal cloning vector containing a BMP/collagen nucleotide chimeric construct. 

[0050] Figure 24 depicts a pMal cloning vector containing a TGF-pycollagen nucleotide chimeric construct. 

[0051] Figure 25 depicts a pMal cloning vector containing a decorin/collagen nucleotide chimeric construct. 

[0052] Figure 26 depicts a pMal cloning vector containing a decorin peptide/collagen nucleotide chimeric construct. 

[0053] Figure 27A-27E depicts a human collagen Type I (a-,) nucleotide sequence (SEQ. ID. NO. 15) and corre- 
50 sponding amino acid sequence (SEQ. ID. NO. 16). 

[0054] Figure 28 is a schematic diagram of the construction of the human collagen gene from synthetic oligonucle- 
. otides, 

[0055] Figure 29 is a schematic depiction of the amino acid sequence of chimeric proteins GST-ColECol (SEQ. ID. 
NO. 17) and GST-D4 (SEQ. ID. NO. 18). 
55 [0056] Figure 30 is a Table depicting occurrence of four proline and four glycine codons in the human Collagen Type 
I (a,) gene with optimized codon usage (ColECol). 

[0057] Figure 31 depicts a gel reflecting expression and dependence of expression of GST-D4 on hydroxyproline. 
[0058] Figure 32 depicts a gel showing expression of GST-D4 in hypertonic media. 
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[0059] Figure 33 is a graph showing circular dichroism spectra of native and denatured D4 in neutral phosphate buffer. 
[0060] Figure 34 depicts a gel representing digestion of D4 with bovine pepsin. 

[0061] Figure 35 depicts a gel representing expression of GST-H Col and GST-ColECol under specified conditions. 
[0062] Figure 36 depicts a gel representing expression of GST-CM4 in media with or without Nad and either proline 
5 or hydroxyproline. 

[0063] Figure 37 depicts a gel of six hour post induction samples of GST-CM4 expressed in E. coli with varying 
concentrations of NaCI. 

[0064] Figure 38 depicts a gel of 4 hour post induction samples of GST-CM4 expressed in E. coli with constant 
amounts of hydroxyproline and varying amounts of proline. 
10 [0065] Figures 39A-39E depict the nucleotide (SEQ. ID. NO. 19) and amino acid (SEQ. ID. NO. 20) sequence of 
HuCol £e , the helical region of human Type I (c^) collagen plus 17 amino terminal extra-helical amino acids and 26 
carboxy terminal extra-helical amino acids with codon usage optimized for E. coli. 

[0066] Figure 40 depicts sequence and restriction maps of synthetic oligos used to reconstruct the first 243 base 
pairs of the human Type I (a,) collagen gene with optimized E. coli codon usage. The synthetic oligos are labelled 
15 N1-1 (SEQ. ID. NO. 21). N1-2 (SEQ. ID. NO. 22), N1-3 (SEQ. ID. NO. 23) and N1-4 (SEQ. ID. NO. 24). 

[0067] Figure 41 depicts a plasmid map of pBSN1-1 containing a 114 base pair fragment of human collagen Type I 
(a 1 ) with optimized E. coli codon usage. 

[0068] Figure 42 depicts the nucleotide (SEQ. ID. NO. 25) and amino acid (SEQ. ID. NO. 26) sequence of a fragment 
of human collagen Type I (a, ) gene with optimized E. coli codon usage encoded by plasmid pBSN 1 -1 . 
20 [0069] Figure 43 depicts a plasmid map of pBSN1-2 containing a 243 base pair fragment of human collagen Type I 
(o^) with optimized E. coli codon usage. 

[0070] Figure 44 depicts the nucleotide (SEQ. ID. NO. 27) and amino acid (SEQ. ID. NO. 28) sequence of a fragment 
of human collagen Type I (a,) gene with optimized £. coli codon usage encoded by plasmid pBSN1-2. 
[0071] Figure 45 depicts a plasmid map of pHuCol^ containing human collagen Type I (a.,) with optimized E. coli 
25 codon usage. 

[0072] Figure 46 depicts a plasmid map of pTrc N 1 -2 containing a 234 nucleotide human collagen Type I (a-, ) fragment 
with optimized E. coli codon usage. 

[0073] Figure 47 depicts a plasmid map of pN1-3 containing a 360 nucleotide human collagen Type I (c^) fragment 
with optimized E. coli codon usage. 
30 [0074] Figure 48 depicts a plasmid map of pD4 containing a 657 nucleotide human collagen Type I (c^) 3' fragment 
with optimized E. coli codon usage. 

[0075] Figures 49A-49E depict the nucleotide (SEQ. ID. NO. 29) and amino acid (SEQ. ID. NO. 30) sequence of a 
helical region of human Type I (02) collagen plus 11 amino terminal extra-helical amino acids and 12 carboxy terminal 
extrahelical amino acids. 

35 [0076] Figures 50A-50E depict the nucleotide (SEQ. ID. NO. 31) and amino acid (SEQ. ID. NO. 32) sequence of 
HuCol(02) ec , the helical region of human Type I (o^) collagen plus 11 amino terminal extra-helical amino acids and 12 
carboxy terminal extra-helical amino acids with codon usage optimized for E. coli. 

[0077] Figure 51 depicts sequence and restriction maps of synthetic oligos used to reconstruct the first 240 base 
pairs of human Type I (02) collagen gene with optimized E. coli codon usage. The synthetic oligos are labelled N1-1 
40 (o2) (SEQ. ID. NO. 33), N1-2 (ct2) (SEQ. ID. NO. 34), N1-3 (a2) (SEQ. ID. NO. 35) and N1-4 (o2) (SEQ. ID. NO. 36). 
[0078] Figure 52 depicts a plasmid map of pBsN1-l (02) containing a 117 base pair fragment of human collagen Type 
I (0(2) with optimized E. coli codon usage. 

[0079] Figure 53 depicts a plasmid map of pBSN1-2 (02) containing a 240 base pair fragment of human collagen 
Type I (02) with optimized £. coli codon usage. 
45 [0080] Figure 54 depicts the nucleotide (SEQ. ID. NO. 37) and amino acid (SEQ. ID. NO. 38) sequence of a fragment 
of human collagen Type I (02) gene with optimized E. coli usage encoded by plasmid pBSN1-2(a2). 
[0081] Figure 55 depicts a plasmid map of pHucol(a2) Ee containing the entire human collagen Type I (02) gene with 
optimized E. coli codon usage. 

[0082] Figure 56 depicts a plasmid map of pN1-2 (0C2) containing a 240 base pair fragment of human collagen Type 
so 1(02) with optimized E. coli codon usage. 

[0083] Figure 57 depicts a gel reflecting expression of GST and TGF-B1 under specified conditions. 
[0084] Figure 58 depicts a gel reflecting expression of MBP, FN-BMP-2A, FN-TGF-B1 and FN under specified con- 
ditions. 

[0085] Figure 59 depicts a gel showing expression of GST-Coll under specified conditions. 
55 [0086] Figure 60 depicts a plasmid map of pGST-CM4 containing the gene for glutathione S- transferase fused to 
the gene for collagen mimetic 4. 

[0087] Figure 61 depicts the nucleotide (SEQ. ID. NO. 39) and amino acid (SEQ. ID. NO. 40) sequence of collagen 
mimetic 4. 
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[0088] Figure 62A depicts a chromatogram of the elution of hydroxyproline containing collagen mimetic 4 from a 
Poros RP2 column. The arrow indicates the peak containing hydroxyproline containing collagen mimetic 4. 
[0089] Figure 62B depicts a chromatogram of the elution of proline-containing collagen mimetic 4 from a Poros RP2 
column. The arrow indicates the peak containing proline containing collagen mimetic 4. 
5 [0090] Figure 63A depicts a chromatogram of a proline amino acid standard (250 pmol). 

[0091] Figure 63B depicts a chromatogram of a hydroxyproline amino acid standard (250 pmol). 

[0092] Figure 63C depicts an amino acid analysis chromatogram of the hydrolysis of proline containing collagen 

mimetic 4. 

[0093] Figure 63D depicts an amino acid analysis chromatogram of the hydrolysis of hydroxyproline containing col- 
10 lagen mimetic 4. 

[0094] Figure 64 is a graph of OD600 versus time for cultures of E. coli JM109 (F-) grown to plateau and then 
supplemented with various amino acids. 

[0095] Figure 65 depicts a plasmid map of pcEc-a1 containing the gene for HuCol(cc1) Ec . 

[0096] Figure 66 depicts a plasmid map of pcEc-a2 containing the gene for HuColfctf) 50 
»5 [0097] Figure 67 depicts a plasmid map of pD4-a1 containing the gene for a 219 amino acid C-terminal fragment of 

Type I (ct1) human collagen with optimized E. coli codon usage fused to the gene for glutathione S-transferase. 

[0098] Figure 68 depicts a plasmid map of pD4-a2 containing the gene for a 207 amino acid C-terminal fragment of 

Type I (o2) human collagen with optimized E. coli codon usage fused to the gene for glutathione S-transferase. 

[0099] Figure 69 depicts the predicted amino acid sequence from the DNA sequence of the first 1 3 amino acid acids 
20 of protein D4-a1 (SEQ. ID. NO. 41) and the amino acid sequence as experimentally determined (SEQ. ID NO. 42). 

[0100] Figure 70 depicts the mass spectrum of hydroxyproline containing D4-«1. 

[0101] Figure 71 depicts the nucleotide sequence of a 657 nucleotide human collagen Type I (a1)3' fragment with 
optimized E. coli codon usage designated D4 (SEQ, ID. NO. 43). 

[0102] Figure 72 depicts the amino acid sequence of a 21 9 amino acid C-terminal fragment of human collagen Type 
25 I (crt ) designed D4 (SEQ. ID. NO. 44). 

[0103] Figure 73 is a plasmid map illustrating pGEX-4T. 1 containing the gene for glutatione S-transferase. 
[0104] Figure 74 is a plasmid map illustrating pTrc-TGF containing the gene for the mature human TGF-B1 polypep- 
tide. 

[0105] Figure 75 is a plasmid map illustrating pTrc-Fn containing the gene for a 70 kDa fragment of human fibronectin. 
30 [0106] Figure 76 is a plasmid map illustrating pTrc-Fn-TGF containing the gene for a fusion protein of a 70 kDA 
fragment of human fibronectin and the mature human TGF-61 polypeptide. 

[0107] Figure 77 is a plasmid map illustrating pTrc-Fn-BMP containing the gene for a fusion protein of a 70 kDa 
fragment of human fibronectin and human bone morphogenic protein 2A. 

[0108] Figure 78 is a plasmid map illustrating pGEX-HuColl Ec containing the gene for a fusion between glutathione 
35 S-transferase and Type I (o1 ) human collagen with optimized E. coli codon usage. 

[0109] Figure 79 depicts the nucleotide sequence of a 627 nucleotide human collagen Type I (ot2) 3" fragment with 
optimized E. coli codon usage (SEQ. ID. NO.45). 

[011 0] Figure 80 depicts the amino acid sequence of a 209 amino acid C-terminal fragment of human collagen Type 
I (a2) (SEQ. ID. NO. 46). 

40 [0111] Figure 81 depicts the sequence of synthetic oligos used to reconstruct the first 282 base pairs of the gene for 
the carboxy terminal 219 amino acids of human Type I (<x1) collagen with optimized E. coli codon usage designated 
N4-1 (SEQ. ID. NO, 47), N4-2 (SEQ. ID. NO. 48), N4-3 (SEQ. ID. NO. 49) and N4-4 (SEQ. ID. NO. 50). 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

45 

[0112] Prokaryotic cells and eukaryotic cells can unexpectedly be made to assimilate and incorporate frans-4-hy- 
droxyproline into proteins contrary to both Papas et al. and Deming et al., supra. Such assimilation and incorporation 
is especially useful when the structure and function of a polypeptide depends on post translational hydroxylation of 
proline not provided by the native protein production system of a recombinant host. Thus, prokaryotic bacteria such 

50 as E. coli and eukaryotic cells such as Saccharomyces cerevisiae, Saccharomyces caiisbergensis and Schizosaccha- 
romyces pombe that ordinarily do not hydroxylate proline and additional eukaryotes such as insect cells including 
lepidopteran cell lines including Spodoptera fiugiperda, Trichoplasia ni, Heliothis virescens, Bombyx mon'infected with 
a baculovirus; CHO cells, COS cells and NIH 3T3 cells which fail to adequately produce certain polypeptides whose 
structure and function depend on such hydroxylation can be made to produce polypeptides having hydroxylated pro- 

55 lines. Incorporation includes adding frans-4-hydroxyproline to a polypeptide, for example, by first changing an amino 
acid to proline, creating a new proline position that can in turn be substituted with frans-4-hydroxyproline or substituting 
a naturally occurring proline in a polypeptide with trans-4-hydroxyprbline as well. 

[0113] The process of producing recombinant polypeptides in mass producing organisms is well known. Replicable 
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expression vectors such as plasmids, viruses, cosmids and artificial chromosomes are commonly used to transport 
genes encoding desired proteins from one host to another. It is contemplated that any known method of cloning a gene, 
ligating the gene into an expression vector and transforming a host cell with such expression vector can be used in 
furtherance of the present disclosure. 

[0114] Not only is incorporation of frans-4-hydroxyproline into polypeptides which depend upon f/a/?s-4-hydroxypro- 
line for chemical and physical properties useful in production systems which do not have the appropriate systems for 
converting proline to frans-4-hydroxyproline, but useful as well in studying the structure and function of polypeptides 
which do not normally contain frans-4-hydroxyproline. It is contemplated that the following amino acid analogs may 
also be incorporated in accordance with the present disclosure: trans-4 hydroxyproline, 3-hydroxyproline, c/s-4-fluoro- 
L-proline and combinations thereof (hereinafter referred to as the "amino acid analogs"). Use of prokaryotes and eu- 
karyotes is desirable since they allow relatively inexpensive mass production of such polypeptides. It is contemplated 
that the amino acid analogs can be incorporated into any desired polypeptide. In a preferred embodiment the prokaryotic 
cells and eukaryotic cells are starved for proline by decreasing or eliminating the amount of proline in growth media 
prior to addition of an amino acid analog herein. 

[011 5] Expression vectors containing the gene for maltose binding protein (MBP), e.g., see Figure 1 illustrating plas- 
mid pMAL-c2, commercially available from New England Bio-Labs, are transformed into prokaryotes such as E. coli 
proline auxotrophs or eukaryotes such as S. cerevisiae auxotrophs which depend upon externally supplied proline for 
protein synthesis and anabolism. Other preferred expression vectors for use in prokaryotes are commercially available 
plasmids which include pKK-223 (Pharmacia), pTRC (Invitrogen), pGEX (Pharmacia), pET (Novagen) and pQE (Qui- 
agen). It should be understood that any suitable expression vector may be utilized by those with skill in the art 
[0116] Substitution of the amino acid analogs for proline in protein synthesis occurs since prolyl tRNA synthetase is 
sufficiently promiscuous to allow misacylation of proline tRNA with any one of the amino acid analogs. A sufficient 
quantity, i.e., typically ranging from about .001 M to about 1.0 M, but more preferably from about .005M to about 0.5M 
of the amino acid analog(s) is added to the growth medium for the transformed cells to compete with proline in cellular 
uptake. After sufficient time, generally from about 30 minutes to about 24 hours or more, the amino acid analog(s) is 
assimilated by the cell and incorporated into protein synthetic pathways. As can be seen from Figures 2 and 2A, 
intracellular concentration of frans-4-hydroxyproline increases by increasing the concentration of sodium chloride in 
the growth media. In a preferred embodiment the prokaryotic cells and/or eukaryotic cells are starved for proline by 
decreasing or eliminating the amount of proline in growth media prior to addition of an amino acid analog herein. 
[0117] Expression vectors containing the gene for human Type I (a1 ) collagen (DNA sequence illustrated in Figures 
3 and 3A; plasmid map illustrated in Figure 4) are transformed into prokaryotic or eukaryotic proline auxotrophs which 
depend upon externally supplied proline for protein synthesis and anabolism. As above, substitution of the amino acid 
analog(s) occurs since prolyl tRNA synthetase is sufficiently promiscuous to allow misacylation of proline tRNA with 
the amino acid analog(s). The quantity of amino acid analpg(s) in media given above is again applicable. 
[0118] Expression vectors containing DNA encoding fragments of human Type 1 (a1) collagen (e.g., DNA sequence 
illustrated in Figure 5 and plasmid map illustrated in Figure 6) are transformed into prokaryotic or eukaryotic auxotrophs 
as above. Likewise, expression vectors containing DNA encoding collagen-like polypeptide (e.g., DNA sequence illus- 
trated in Figure 7, amino acid sequence illustration in Figure 8 and plasmid map illustrated in Figure 9) can be used 
to transform prokaryotic or eukaryotic auxotrophs as above. Collagen-like peptides are those which contain at least 
partial homology with collagen and exhibit similar chemical and physical characteristics to collagen. Thus, collagen- 
like peptides consist, e.g., of repeating arrays of Gly-X-Y triplets in which about 35% of the X and Y positions are 
occupied by proline and 4-hydroxyproline. Collagen-like peptides are interchangeably referred to herein as collagen- 
like proteins, collagen-like polypeptides, collagen mimetic polypeptides and collagen mimetic. Certain preferred colla- 
gen fragments and collagen-like peptides in accordance herewith are capable of assembling into an extracellular matrix. 
In both collagen fragments and collagen-like peptides as described above, substitution with amino acid analog(s) occurs 
since prolyl tRNA synthetase is sufficiently promiscuous to allow misacylation of proline tRNA with one or more of the 
amino acid analog(s). The quantity of amino acid analog(s) given above is again applicable. 
[0119] It is contemplated that any polypeptide having an extracellular matrix protein domain such as a collagen, 
collagen fragment or collagen-like peptide domain can be made to incorporate amino acid analog(s) in accordance 
with the disclosure herein. Such polypeptides include collagen, a collagen fragment or collagen-like peptide domain 
and a domain having a region incorporating one or more physiologically active agents such as glycoproteins, proteins, 
peptides and proteoglycans. As used herein, physiologically active agents exert control over or modify existing phys- 
iologic functions in living things. Physiologically active agents include hormones, growth factors, enzymes, ligands and 
receptors. Many active domains of physiologically active agents have been defined and isolated. It is contemplated 
that polypeptides having a collagen, collagen fragment or collagen-like peptide domain can also have a domain incor- 
porating one or more physiologically active domains which are active fragments of such physiologically active agents. 
As used herein, physiologically active agent is meant to include entire peptides, polypeptides, proteins, glycoproteins, 
proteoglycans and active fragments of any of them. Thus, chimeric proteins are made to incorporate amino acid analog 
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(s) by transforming a prokaryotic proline auxotroph or a eukaryotic proline auxotroph with an appropriate expression 
vector and contacting the transformed auxotroph with growth media containing at least one of the amino acid analogs. 
For example, a chimeric collagen/bone morphogenic protein (BMP) construct or various chimeric collagen/growth factor 
constructs are useful in accordance herein. Such growth factors are well-known and include insulin-like growth factor. 

5 transforming growth factor, platelet derived growth factor and the like. Figure 10 illustrates DNA of BMP which can be 
fused to the 3' terminus of DNA encoding collagen. DNA encoding a collagen fragment or DNA encoding a collagen- 
like peptide. Figure 11 illustrates a map of plasmid pCBC containing a collagen/BMP construct. In a preferred embod- 
iment, proteins having a collagen, collagen fragment or collagen-like peptide domain assemble or aggregate to form 
an extracellular matrix which can be used as a surgical implant. The property of self-aggregation as used herein includes 

10 the ability to form an aggregate with the same or similar molecules or to form an aggregate with different molecules 
that share the property of aggregation to form, e.g., a double or triple helix. An example of such aggregation is the 
structure of assembled collagen matrices. 

[0120] Indeed, chimeric polypeptides which may also be referred to herein as chimeric proteins provide an integrated 
combination of a therapeutically active domain from a physiologically active agent and one or more EMP moieties. The 

is EMP domain provides an integral vehicle for delivery of the therapeutically active moiety to a target site. The two 
domains are linked covalently by one or more peptide bonds contained in a linker region. As used herein, integrated 
or integral means characteristics which result from the covalent association of one or more domains of the chimeric 
proteins. The therapeutically active moieties disclosed herein are typically made of amino acids linked to form peptides, 
polypeptides, proteins, glycoproteins or proteoglycans. As used herein, peptide encompasses polypeptides and prc- 

20 teins. 

[0121] The inherent characteristics of EMPs are ideal for use as a vehicle for the therapeutic moiety. One such 
characteristic is the ability of the EMPs to form the self-aggregate. Examples of suitable EMPs are collagen, elastin, 
fibronectin, fibrinogen and fibrin. Fibrillar collagens (Type I. II and III) assemble into ordered polymers and often ag- 
gregate into larger bundles. Type IV collagen assembles into sheetlike meshworks. Elastin molecules form filaments 

25 and sheets in which the elastin molecules are highly cross-linked to one another to provide good elasticity and high 
tensile strength. The cross-linked, random-coiled structure of the fiber network allows it to stretch and recoil like a 
rubber band. Fibronectin is a large fibril forming glycoprotein, which, in one of its forms, consists pf highly insoluble 
fibrils cross-linked to each other by disulfide bonds. Fibrin is an insoluble protein formed from fibrinogen by the prote- 
olytic activity of thrombin during the normal clotting of blood. 

30 [0122] The molecular and macromolecular morphology of the above EMPs defines networks or matrices to provide 
substratum or scaffolding in integral covalent association with the therapeutically active moiety. The networks or ma- 
trices formed by the EMP domain provide an environment particularly well suited for ingrowth of autologous cells 
involved in growth, repair and replacement of existing tissue. The integral therapeutically active moieties covalently 
bound within the networks or matrices provide maximum exposure of the active agents to their targets to elicit a desired 

35 response. 

[0123] Implants formed of or from the present chimeric proteins provide sustained release activity in or at a desired 
locus or target site. Since it is linked to an EMP domain, the therapeutically active domain of the present chimeric 
protein is not free to separately diffuse or otherwise be transported away from the vehicle which carries it, absent 
cleavage of peptide bonds. Consequently, chimeric proteins herein provide an effective anchor for therapeutic activity 

40 which allows the activity to be confined to a target location for a prolonged duration. Because the supply of therapeu- 
tically active agent does not have to be replenished as often when compared to non-sustained release dosage forms, 
smaller amounts of therapeutically active agent may be used over the course of therapy. Consequently, certain advan- 
tages provided by the present chimeric proteins are a decrease or elimination of local and systemic side effects, less 
potentiation or reduction in therapeutic activity with chronic use, and minimization of drug accumulation in body tissue 

45 with chronic dosing. 

[0124] Use of recombinant technology allows manufacturing of non-immunogenic chimeric proteins. The DNA en- 
coding both the therapeutically active moiety and the EMP moiety should preferably be derived from the same species 
as the patient being treated to avoid an immunogenic reaction. For example, if the patient is human, the therapeutically 
active moiety as well as the EMP moiety is preferably derived from human DNA. 

50 [0125] Osteogenic/EMP chimeric proteins provide biodegradable and biocompatible agents for inducing bone for- 
mation at a desired site. As stated above, in one embodiment, a BMP moiety is covalently linked with an EMP to form 
chimeric protein. The BMP moiety induces osteogenesis and the extracellular matrix protein moiety provides an integral 
substratum or scaffolding for the BMP moiety and cells which are involved in reconstruction and growth. Compositions 
containing the BMP/EMP chimeric protein provide effective sustained release delivery of the BMP moiety to desired 

55 target sites. The method of manufacturing such an osteogenic agent is efficient because the need for extra time con- 
suming steps as purifying EMP and then admixing it with the purified BMP are eliminated. An added advantage of the 
BMP/EMP chimeric protein results from the stability created by the covalent bond between BMP and the EMP, i.e., the 
BMP portion is not free to separately diffuse away from the EMP, thus providing a more stable therapeutic agent. 



9 



EP 0 992 586 A2 



[0126] Bone morphogenic proteins are class identified as BMP-1 through BMP-9. A preferred osteogenic protein for 
use in human patients is human BMP-2B. A BMP-2B/collagen IA chimeric protein is illustrated in Fig. 13 (SEQ. ID. 
NO. 6). The protein sequence illustrated in Fig. 15 (SEQ. ID. NO. 8) includes a collagen helical domain depicted at 
amino acids 1-1057 and a mature form of BMP-2B at amino acids 1060-1169. The physical properties of the chimeric 

5 protein are dominated in part by the EMP component. In the case of a collagen moiety, a concentrated solution of 
chimeric protein will have a gelatinous consistency that allows easy handling by the medical practitioner. The EMP 
moiety acts as a sequestering agent to prevent rapid desorption of the BMP moiety from the desired site and to provide 
sustained release of BMP activity. As a result, the BMP moiety remains at the desired site and provides sustained 
release of BMP activity at the desired site for a period of time necessary to effectively induce bone formation. The EMP 

io moiety also provides a matrix which allows a patient's autologous cells, e.g., chondrocytes and the like, which are 
normally involved in osteogenesis to collect therein and form an autologous network for new tissue growth. The gelat- 
inous consistency of the chimeric protein also provides a useful and convenient therapeutic manner for immobilizing 
active BMP on a suitable vehicle or implant for delivering the BMP moiety to a site where bone growth is desired. 
[0127] The BMP moiety and the EMP moiety are optionally linked together by linker sequences of amino acids. 

»5 Examples of linker sequences used are illustrated within the sequence depicted in Figs. 14A-14C (SEQ. ID. NO. 7), 
16A-16C (SEQ. ID. NO. 9), 19A-19C (SEQ. ID. NO. 12) and 20A-20C (SEQ. ID. NO. 13), and are described in more 
detail below. Linker sequences may be chosen based on particular properties which they impart to the chimeric protein. 
For example, amino acid sequences such as lle-Glu-Gly-Arg and Leu-Val-Prc-Arg are cleaved by factor XA and 
thrombin enzymes, respectively. Incorporating sequences which are cleaved by proteolytic enzymes into chimeric pro- 

20 teins herein provides cleavage at the linker site upon exposure to the appropriate enzyme and separation of the two 
domains into separate entities. It is contemplated that numerous linker sequences can be incorporated into any of the 
chimeric proteins. 

[0128] In another embodiment, a chimeric DNA construct includes a gene encoding an osteogenic protein or a frag- 
ment thereof linked to gene encoding an EMP or a fragment thereof. The gene sequence for various BMPs are known, 

25 see. e.g., U.S. Patent Nos. 4,294,753, 4,761,471. 5,106,748, 5,187,076, 5,141,905. 5,108,922, 5,116,738 and 
5,168,050, each incorporated herein by reference. A BMP-2B gene for use herein is synthesized by ligating oligonu- 
cleotides encoding a BMP protein. The oligonucleotides encoding BMP-2B are synthesized using an automated DNA 
synthesizer (Beckmen Oligo-1 000); In preferred embodiment, the nucleotide sequence encoding the BMP is maximized 
for expression in E. coli. This is accomplished by using E.coli utilization tables to translate the sequence of amino acids 

30 of the BMP into codons that are utilized most often by E. coli. Alternatively, native DNA encoding BMP isolated from 
mammals including humans may be purified and used. 

[0129] The BMP gene and the DNA sequence encoding an extracellular matrix protein are cloned by standard genetic 
engineering methods as described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor 
1989, hereby incorporated by reference. 
'35 [0130] The DNA sequence corresponding to the helical and telepeptide region of collagen I(a1) is cloned from a 
human fibroblast cell line. Two sets of polymerase chain reactions are carried out using cDNA prepared by standard 
methods from AG02261 A cells. The first pair of PCR primers include a 5' primer bearing an Xmnl linker sequence and 
a 3' primer bearing the Bsml site at nucleotide number 1722. The resulting PCR product consists of sequence from 
position 1 to 1722. The second pair of primers includes the Bsml site at 1722 and a linker sequence at the 3' end 

40 bearing a Bg1 II site. The resulting PCR product consists of sequence from position 1722 to 3196. The complete se- 
quence is assembled by standard cloning techniques. The two PCR products are ligated together at the Bsml site, and 
the combined clone is inserted into any vector with Xmnl-Bg1ll sites such as pMAL-c2 vector. 
[0131] To clone the BMP-2B gene, total cellular RNA is isolated from human osteosarcoma cells (U-20S) by the 
method described by Robert E. Farrel Jr. (Academic-Press, CA, 1993 pp. 68-69) (herein incorporated by reference). 

45 The integrity of the RNA is verified by spectrophotometric analysis arid electrophoresis through agarose gels. Typical 
yields of total RNA are 50 u.g from a 100mm confluent tissue culture dish. The RNA is used to generate cDNA by 
reverse transcription using the Superscript pre-amplification system by Gibco BRL. The cDNA is used as template for 
PCR amplification using upstream and downstream primers specific for BMP-2B (GenBank HUMBMP2B accession 
#M22490). The resulting PCR product consists of BMP-2B sequence from position 1289-1619. The PCR product is 

50 resolved by electrophoresis through agarose gels, purified with gene dean (BIO 101 ) and ligated into pMal-c2 vector 
(New England Biolabs). The domain of human collagen l(ori) chain is cloned in a similar manner. However, the total 
cellular RNA is isolated from a human fibroblast cell line (AG02261A human skin fibroblasts). 
[0132] A chimeric BMP/EMP DNA construct is obtained by ligating a synthetic BMP gene to a DNA sequence en- 
coding an EMP such as collagen, fibrinogen, fibrin, fibronectin, elastin or laminin. However, chimeric polypeptides 

55 herein are not limited to these particular proteins. Figs. 14A-14C (SEQ. ID. NO. 7) illustrate a DNA construct which 
encodes a BMP-2B/collagen l(al) chimeric protein. The coding sequence for an EMP may be ligated upstream and/or 
downstream and in-frame with a coding sequence for the BMP. The DNA encoding an EMP may be a portion of the 
gene or an entire EMP gene. Furthermore, two different EMPs may be ligated upstream and downstream from the BMP. 
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[0133] The BMP-2B/collagen l(al) chimeric protein illustrated in Figs. 14A-14C includes an Xmnl linker sequence at 
base pairs (bp) 1-19, a collagen domain (bp 20-3190). a Bglll/BamHI linker sequence (bp 3191-3196), a mature form 
of BMP2b (bp 3197-3529) and a Hindlll linker sequence (bp 3530-3535). 

[0134] Any combination of growth factor and matrix protein sequences are contemplated including repeating units, 

5 or multiple arrays of each segment in any order. 

[0135] Incorporation of fragments of both matrix and growth factor proteins is also contemplated. For example, in 
the case of collagen, only the helical domain may be included. Other matrix proteins have defined domains, such as 
laminin, which has EGF-like domains. In these cases, specific functionalities can be chose"n to achieve desired effects. 
Moreover, it may be useful to combine domains from disparate matrix proteins, such as the helical region of collagen 

10 and the cell attachment regions of fibronectin. In the case of growth factors, specific segments have been shown to 
be removed from the mature protein by post translational processing. Chimeric proteins^can be designed to include 
only the mature biologically active region. For example, in the case of BMP-2B only the final 110 amino acids are found 
in the active protein. 

[0136] In another embodiment, a transforming growth factor (TGF) moiety is covalently linked with an EMP to form 
15 a chimeric protein. The TGF moiety increases efficacy of the body's normal soft tissue repair response and also induces 
osteogenesis. Consequently, TGF/EMP chimeric proteins may be used for either or both functions. One of the funda- 
mental properties of the TGF-fJs is their ability to turn on various activities that result in the synthesis of new connective 
tissue. See, Piez and Sporn eds.. Transforming Growth Factor-f$s Chemistry, Biology and Therapeutics, Annals of the 
New York Academy of Sciences, Vol. 593, (1990). TGF-p" is known to exist in at least five different isoforms. The DNA 
20 sequence for Human TGF-p, is known and has been cloned. See Derynck et al., Human Transforming Growth Factor- 
Beta cDNA Sequence and Expression in Tumour Cell Lines, Nature, Vol. 316, pp. 701-705 (1985), herein incorporated 
by reference. TGF-B 2 has been isolated from bovine bone, human glioblastoma cells and porcine platelets. TGF-B 3 
has also been cloned. See ten Dijke, et al., Identification of a New Member of the Transforming Growth Factor-6 Gene 
Family, Proc. Natl. Acad. Sci. (USA), Vol. 85, pp. 4715-4719 (1988) herein incorporated by reference. 
25 [01 37] A TGF-B/EMP chimeric protein incorporates the known activities of TGF-6s and provides integral scaffolding 
or substratum of the EMP as described above to yield a composition which further provides sustained release focal 
delivery at target sites. 

[0138] The TGF-P moiety and the EMP moiety are optionally linked together by linker sequences of amino acids. 
Linker sequences may be chosen based upon particular properties which they impart to the chimeric protein. For 

30 example, amino acid sequences such as lle-Glu-Glyn-Arg and Leu-Val-Pro-Arg are cleaved by Factor XA and Thrombin 
enzymes, respectively. Incorporating sequences which are cleaved by proteolytic enzymes into the chimeric protein 
provides cleavage at the linker site upon exposure to the appropriate enzyme and separation of the domains into 
separate entities. Fig. 15 depicts an amino acid sequence for a TGF-p,/collagen IA chimeric protein (SEQ. ID. NO. 8). 
The illustrated amino acid sequence includes the collagen domain (1-1057) and a mature form of TGF-0, (1060-1171). 

35 [01 39] A chimeric DNA construct includes a gene encoding TGF-P, or a fragment thereof, or a gene encoding TGF- 
P 2 or a fragment thereof, or a gene encoding TGF-P 3 or a fragment thereof, ligated to a DNA sequence encoding an 
EMP protein such as collagen (l-IV), fibrin, fibrinogen, fibronectin, elastin or laminin. A preferred chimeric DNA construct 
combines DNA encoding TGF-p,, a DNA linker sequence, and DNA encoding collagen IA. A chimeric DNA construct 
containing TGF-p, gene and a collagen I(a1)gene is shown in Figs. 16A-16C (SEQ. ID. NO. 9). The illustrated construct 
includes an Xmnl linker sequence (bp 1-19), DNA encoding a collagen domain (bp 20-3190), a Bglll linker sequence 
(bp 3191-3196), DNA encoding a mature form of TGF-P, (3197-3535), and an Xbal linker sequence (bp 3536-3541). 
[0140] The coding sequence for EMP may be ligated upstream and/or downstream and in-frame with a coding se- 
quence for the TGFp. The DNA encoding the extracellular matrix protein may encode a portion of a fragment of the 
EMP or may encode the entire EMP, Likewise, the DNA encoding the TGF-P may be one or more fragments thereof 

« or the entire gene. Furthermore, two or more different TGF-ps or two or more different EMPs may be ligated upstream 
or downstream of alternate moieties. 

[0141] In yet another embodiment, a dermatan sulfate proteoglycan moiety, also known as decorin or proteoglycan 
II, is covalently linked with an EMP to form a chimeric protein. Decorin is known to bind to type I collagen and thus 
affect fibril formation, and to inhibit the cell attachment-promoting activity of collagen and fibrinogen by binding to such 
50 molecules near their cell binding sites. Chimeric proteins which contain a decorin moiety act to reduce scarring of 
healing tissue. The primary structure of the core protein of decorin has been deduced from cloned cDNA. See Krusius 
et al., Primary Structure of an Extracellular Matrix Proteoglycan Core Protein-Deduced from Cloned cDNA, Proc. Natl. 
Acad. Sci. (USA), Vol. 83, pp. 7683-7687 (1986) incorporated herein by reference. 

[0142] A decorin/EMP chimeric protein incorporates the known activities of decorin and provides integral scaffolding 
55 or substratum of the EMP as described above to yield a composition which allows sustained release focal delivery to 
target sites. Figs. 17A-17B illustrate a decorin/collagen IA chimeric protein (SEQ. ID. NO. 10) in which the collagen 
domain includes amino acids 1-1 057 and the decorin mature protein incudes amino acids 1 060-1 388. Fig. 1 8 illustrates 
. a decorin peptide/collagen IA chimeric protein (SEQ. ID. NO. 11 ) in which the collagen helical domain includes amino 
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acids 1-1057 and the decorin peptide fragment includes amino acids 1060-1107. The decorin peptide fragment is 
composed of P46 to G93 of the mature form of decorin. 

[0143] Further provided is a chimeric DNA construct which includes a gene encoding decorin or one or more frag- 
ments thereof, optionally ligated via a DNA linker sequence to a DNA sequence encoding an EMP such as collagen 

5 (l-IV), fibrin, fibrinogen, fibronectin, elastin or laminin. A preferred chimeric DNA construct combines DNA encoding 
decorin, a DNA linker sequence, and DNA encoding collagen I(cc1). A chimeric DNA construct containing a decorin 
gene and a collagen I(a1) gene is shown in Figs. 19A-19D (SEQ. ID. NO. 12). The illustrated construct includes an 
Xmnl linker sequence (bp 1-19), DNA encoding a collagen domain (bp 20-3190), a Bglll linker sequence (bp 
3191-3196), DNA encoding a mature form of decorin (bp 3197-4186) and a Pstl linker sequence. A chimeric DNA 

io . construct containing a decorin peptide gene and a collagen I(a1) gene is shown in Figs. 20A-20C (SEQ. ID. NO. 13). 
The illustrated construct includes an Xmnl linker sequence (bp 1-1 9), DNA encoding a collagen domain (bp 20-31 90), 
a Bglll linker sequence (bp 3191-3196), DNA encoding a peptide fragment of decorin (bp 3197-3343), and a Pstl linker 
sequence (bp 3344-3349). 

[0144] The coding sequence for an EMP may be ligated upstream and/or downstream and in-frame with a coding 

15 sequence for decorin. The DNA encoding the EMP may encode a portion or fragment of the EMP or may encode the 
entire EMP. Likewise, the DNA encoding decorin may be a fragment thereof or the entire gene. Furthermore, two or 
more different EMPs may be ligated upstream and/or downstream from the DNA encoding decorin moiety. 
[0145] Any of the above described chimeric DNA constructs may be incorporated into a suitable cloning vector. Fig. 
21 depicts a pMal cloning vector containing a polylinker cloning site. Examples of cloning vectors are the plasmids 

20 pMal-p2 and pMal-c2 (commercially, available from New England Biolabs). The desired chimeric DNA construct is 
incorporated into a polylinker sequence of the plasmid which contains certain useful restriction endonuclease sites 
which are depicted in Fig. 22 (SEQ. ID. NO. 14). The pMal-p2 polylinker sequence has Xmnl, EcoRI, BamHI, Hindlll, 
Xbal, Sail and Pstl restriction endonuclease sites which are depicted in Fig 22. The polylinker sequence is digested 
with an appropriate restriction endonuclease and the chimeric construct is incorporated into the cloning vector by 

25 ligating it to the DNA sequences of the plasmid. The chimeric DNA construct may be joined to the plasmid by digesting 
the ends of the DNA construct and the plasmid with the same restriction endonuclease to generate "sticky ends" having 
5' phosphate and 3' hydroxy! groups which allow the DNA construct to anneal to the cloning vector. Gaps between the 
inserted DNA construct and the plasmid are then sealed with DNA ligase. Other techniques for incorporating the DNA 
construct into plasmid DNA include blunt end ligation, poly(dA.dT) tailing techniques, and the use of chemically syn- 

30 thesized linkers. An alternative method for introducing the chimeric DNA construct into a cloning vector is to incorporate 
the DNA encoding the extracellular matrix protein into a cloning vector already containing a gene encoding a thera- 
peutically active moiety. 

[0146] The cloning sites in the above-identified polylinker site allow the cDNA for the collagen I(a1 J/BMP-2B chimeric 
protein illustrated in Figs. 14A-14C (SEQ. ID. NO. 7) to be inserted between the Xmnl and the Hindlll sites. The cDNA 
35 encoding the collagen I(a1 )/TGF-fa protein illustrated in Figs. 16A-16C (SEQ. ID. NO. 9) is inserted between the Xmnl 
and the Xbal sites. The cDNA encoding the collagen I(a1 )/decorin protein illustrated in Figs. 19A-19D (SEQ. ID. NO. 
12) inserted between the Xmnl and the Pstl sites. The cDNA encoding the collagen l(al)/decorin peptide illustrated in 
Figs. 20A-20C (SEQ. ID. NO. 13) is inserted between the Xmnl and Pstl sites. 

[0147] Plasmids containing the chimeric DNA construct are identified by standard techniques such as gel electro- 
40 phoresis. Procedures and materials for preparation of recombinant vectors , transformation of host cells with the vectors, 
and host cell expression of polypeptides are described in Sambrook et al., Molecular Cloning: A Laboratory Manual, 
supra. Generally, prokaryotic or eukaryotic host cells may be transformed with the recombinant DNA plasmids. Trans- 
formed host cells may be located through phenotypic selection genes of the cloning vector which provide resistance 
to a particular antibiotic when the host cells are grown in a culture medium containing that antibiotic. 
45 [0148] Transformed host cells are isolated and cultured to promote expression of the chimeric protein. The chimeric 
protein may then be isolated from the culture medium and purified by various methods such as dialysis, density gradient 
centrifugation, liquid column chromatography, isoelectric precipitation, solvent fractionation, and electrophoresis. How- 
ever, purification of the chimeric protein by affinity chromatography is preferred whereby the chimeric protein is purified 
by ligating it to a binding protein and contacting it with a ligand or substrate to which the binding protein has a specific 
50 affinity. 

[0149] In order to obtain more effective expression of mammalian or human eukaryotic genes in bacteria (prokary- 
otes), the mammalian or human gene may be placed under the control of a bacterial promoter. A protein fusion and 
purification system is employed to obtain the chimeric protein. Preferably, any of the above-described chimeric DNA 
constructs is cloned into a pMal vector at a site in the vector's polylinker sequence. As a result, the chimeric DNA 
55 construct is operably fused with the malE gene of the pMal vector. The malE gene encodes maltose binding protein 
(MBP). Fig. 23 depicts a pMal cloning vector containing a BMP/collagen DNA construct. A spacer sequence coding 
for 1 0 asparagine residues is located between the malE sequence and the polylinker sequence. This spacer sequence 
insulates MBP from the protein of interest. Figs. 24, 25 and 26 depict pMal cloning vectors containing DNA encoding 
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collagen chimeras with TGF-B,, decorin and a decorin peptide, respectively. The pMal vector containing any of the 
chimeric DNA constructs fused to the malE gene is transformed into E. coli. 

[0150] The E. coli is cultured in a medium which induces the bacteria to produce the maltose-binding protein fused 
to the chimeric protein. This technique utilizes the promoter of the pMal vector. The MBP contains a 26 amino acid 

5 N-terminal signal sequence which directs the MBP-chimeric protein through the E. coli cytoplasmic membrane. The 
protein can then be purified from the periplasm. Alternatively, the pMal-c2 cloning vector can be used with this protein 
fusion and purification system. The pMal-c2 vector contains an exact deletion of the malE signal sequence which 
results in cytoplasmic expression of the fusion protein. A crude cell extract containing the fusion protein is prepared 
and poured over a column of amylose resin. Since MBP has an affinity for the amylose it binds to the resin. Alternatively. 

10 the column can include any substrate for which MBP has a specific affinity. Unwanted proteins present in the crude 
extract are washed through the column. The MBP fused to the chimeric protein is eluted from the column with a neutral 
buffer containing maltose or other dilute solution of a desorbing agent for displacing the hybrid polypeptide. The purified 
MBP-chimeric protein is cleaved with a protease such as factor Xa protease to cleave the MBP from the chimeric 
protein. The pMal-p2 plasmid has a sequence encoding the recognition site for protease factor Xa which cleaves after 

15 the amino acid sequence Isoleucine-Glutamic acid-Glycine-Arginine of the polylinker sequence. 

[0151] The chimeric protein is then separated from the cleaved MBP by passing the mixture over an amylose column. 
An alternative method for separating the MBP from the chimeric protein is by ion exchange chromatography. This 
system yields up to 100mg of MBP-chimeric protein per liter of culture. See Riggs, P., in Ausebel, F.M., Kingston, R. 
E., Moore, D.D., Seidman, J.G., Smith, J.A., Struhl, K. (eds.) Current Protocols in Molecular Biology, Supplement 19 

20 (16.6.1-16.6.10) (1990) Green Associates/Wiley Interscience, New York, New England Biolabs (cat # 800-65S 
9pMALc2) pMal protein fusion and purification system hereby incorporated herein by reference. (See also European 
Patent No. 286 239 herein incorporated by reference which discloses a similar method for production and purification 
of a protein such as collagen.) 

[0152] Other protein fusion and purification systems may be employed to produce chimeric proteins. Prokaryotes 
25 such as E. coli are the preferred host cells for expression of the chimeric protein. However, systems which utilize 
eukaryote host cell lines are also acceptable such as yeast, human, mouse, rat, hamster, monkey, amphibian, insect, 
algae, arid plant cell lines. For example, HeLa (human epithelial), 3T3 (mouse fibroblast), CHO (Chinese hamster 
ovary), and SP 2 (mouse plasma cell) are acceptable cell lines. The particular host cells that are chosen should be 
compatible with the particular cloning vector that is chosen. 
30 [0153] Another acceptable protein expression system is the Baculovirus Expression System manufactured by Invit- 
rogen of San Diego, California. Baculbviruses form prominent crystal occlusions within the nuclei of cells they infect. 
Each crystal occlusion consists of numerous virus particles enveloped in a protein called polyhedrin. In the baculovirus 
expression system, the native gene encoding polyhedrin is substituted with a DNA construct encoding a protein or 
peptide having a desired activity. The virus then produces large amounts of protein encoded by the foreign DNA con- 
35 struct. The preferred cloning vector for use with this system is pBlueBac III (obtained from Invitrogen of San Diego, 
California). The baculovirus system utilizes the Autograph californica multiple nuclear polyhidrosis virus (ACMNPV) 
regulated polyhedrin promoter to drive expression of foreign genes. The chimeric gene, i.e., the DNA construct encoding 
the chimeric protein, is inserted into the pBlueBac III vector immediately downstream from the baculovirus polyhedrin 
promoter. 

40 [0154] The pBlueBac III transfer vector contains a B-galactosidase reporter gene which allows for identification of 
recombinant virus. The B-galactosidase gene is driven by the baculovirus ETL promoter (Pen.) which is positioned in 
opposite orientation to the polyhedrin promoter (P PH ) and the multiple cloning site of the vector. Therefore, recombinant 
virus coexpresses B-galactosidase and the chimeric gene. 

[0155] Spodoptera frugiperda (Sf9) insect cells are then cotransfected with wild type viral DNA and the pBlueBac III 
45 vector containing the chimeric gene. Recombination sequences in the pBlueBac III vector direct the vector's integration 
into the genome of the wild type baculovirus. Homologous recombination occurs resulting in replacement of the native 
polyhedrin gene of the baculovirus with the DNA construct encoding the chimeric protein. Wild type baculovirus which 
do not contain foreign DNA express the polyhedrin protein in the nuclei of the infected insect cells. However, the 
recombinants do not produce polyhedrin protein and do not produce viral occlusions. Instead, the recombinants produce 
so the chimeric protein. 

[0156] Alternative insect host cells for use with this expression system are Sf21 cell line derived from Spodoptera 
frugiperda and High Five cell lines derived from Trichoplusia ni. 

[0157] Other acceptable cloning vectors include phages, cosmids or artificial chromosomes. For example, bacteri- 
ophage lambda is a useful cloning vector. This phage can accept pieces of foreign DNA up to about 20,000 base pairs 
55 jn length. The lambda phage genome is a linear double stranded DNA molecule with single stranded complementary 
(cohesive) ends which can hybridize with each other when inside an infected host cell. The lambda DNA is cut with a 
restriction endonuclease and the foreign DNA, e.g. the DNA to be cloned, is ligated to the phage DNA fragments. The 
resulting recombinant molecule is then packaged into infective phage particles. Host cells are infected with the phage 
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particles containing the recombinant DNA. The phage DNA replicates in the host cell to produce many copies of the 
desired DNA sequence. 

[0158] Cosmids are hybrid plasmid/bacteriophage vectors which can be used to done DNA fragments of about 
40,000 base pairs. Cosmids are plasmids which have one or more DNA sequences called "cos" sites derived from 
5 bacteriophage lambda for packaging lambda DNA into infective phage particles. Two cosmids are ligated to the DNA 
to be cloned. The resulting molecule is packaged into infective lambda phage particles and transfected into bacteria 
host cells. When the cosmids are inside the host cell they behave like plasmids and multiply under the control of a 
plasmid origin of replication. The origin of replication is a sequence of DNA which allows a plasmid to multiply within 
a host cell. 

10 [01 59] Yeast artificial chromosome vectors are similar to plasmids but allow for the incorporation of much larger DNA 
sequences of about 400,000 base pairs. The yeast artificial chromosomes contain sequences for replication in yeast. 
The yeast artificial chromosome containing the DNA to be cloned is transformed into yeast cells where it replicates 
thereby producing many copies of the desired DNA sequence. Where phage, cosmids, or yeast artificial chromosomes 
are employed as cloning vectors, expression of the chimeric protein may be obtained by culturing host cells that have 

is been transfected or transformed with the cloning vector in a suitable culture medium. 

[0160] Chimeric proteins disclosed herein are intended for use in treating mammals or other animals. The therapeu- 
tically active moieties described above, e.g., osteogenic agents such as BMPs, TGFs, decorin, and/or fragments of 
each of them, are all to be considered as being or having been derived from physiologically active agents for purposes 
of this description. The chimeric proteins and DNA constructs which incorporate a domain derived from one or more 

20 cellular physiologically active agents can be used for in vivo therapeutic treatment, in vitro research or for diagnostic 
purposes in general. 

[0161] When used in vivo , formulations containing the present chimeric proteins may be placed in direct contact with 
viable tissue, including bone, to induce or enhance growth, repair and/or replacement of such tissue. This may be 
accomplished by applying a chimeric protein directly to a target site during surgery. It is contemplated that minimally 
25 invasive techniques such as endoscopy are to be used to apply a chimeric protein to a desired location. Formulations 
containing the chimeric proteins disclosed herein may consist solely of one or more chimeric proteins or may also 
incorporate one or more pharmaceutically acceptable adjuvants. 

[0162] In an alternate embodiment, any of the above-described chimeric proteins may be contacted with, adhered 
to, or otherwise incorporated into an implant such as a drug delivery device or a prosthetic device. Chimeric proteins 

30 may be microencapsulated or macroencapsulated by liposomes or other membrane forming materials such as alginic 
acid derivatives prior to implantation and then implanted in the form of a pouchlike implant. The chimeric protein may 
be microencapsulated in structures in the form of spheres, aggregates of core material embedded in a continuum of 
wall material or capillary designs. Microencapsulation techniques are well known in the art and are described in the 
Encyclopedia of Polymer Science and Engineering, Vol. 9, pp. 724 et seq. (1980) hereby incorporated herein by ref- 

35 erence. 

[0163] Chimeric proteins may also be coated on or incorporated into medically useful materials such as meshes, 
pads, felts, dressings or prosthetic devices such as rods, pins, bone plates, artificial joints, artificial limbs or bone 
augmentation implants. The implants may, in part, be made of biocompatible materials such as glass, metal, ceramic, 
calcium phosphate or calcium carbonate based materials. Implants having biocompatible biomaterials are well known 

40 in the art and are all suitable for use herein. Implant biomaterials derived from natural sources such as protein fibers, 
polysaccharides, and treated naturally derived tissues are described in the Encyclopedia of Polymer Science and 
Engineering, Vol. 2, pp. 267 et seq. (1 989) hereby incorporated herein by reference. Synthetic biocompatible polymers 
are well known in the art and are also suitable implant materials. Examples of suitable synthetic polymers include 
urethanes, olefins, terephthalates, acrylates, polyesters and the like. Other acceptable implant materials are biode- 

45 gradable hydrogels or aggregations of closely packed particles such as polymethylmethacrylate beads with a polym- 
erized hydroxyethyl methacrylate coating. See the Encyclopedia of Polymer Science and Engineering, Vol. 2, pp. 267 
et seq. (1989) hereby incorporated herein by reference. 

[0164] The chimeric protein herein provides a useful way for immobilizing or coating a physiologically active agent 
on a pharmaceutically acceptable vehicle to deliver the physiologically active agent to desired sites in viable tissue. 

50 Suitable vehicles include those made of bioabsorbable polymers, biocompatible nonabsorbable polymers, lactoner 
putty and plaster of Paris. Examples of suitable bioabsorbable and biocompatible polymers include homopolymers, 
copolymers and blends of hydroxyacids such as lactide and glycolide, other absorbable polymers which may be used 
alone or in combination with hydroxyacids including dioxanones, carbonates such as trimethylene carbonate, lactones 
such as caprolactone, polyoxyalkylenes, and oxylates. See the Encyclopedia of Polymer Science and Engineering, 

55 Vol. 2, pp. 230 et seq. (1989) hereby incorporated herein by reference. 

[0165] These vehicles may be in the form of beads, particles, putty, coatings or film vehicles. Diffusional systems in 

which a core of chimeric protein is surrounded by a porous membrane layer are other acceptable vehicles. 

[0166] In another aspect, the amount of amino acid analog(s) transport into a target cell can be regulated by con- 
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trolling the tonicity of the growth media. A hypertonic growth media increases uptake of frans-4-hydroxyproline into £. 
coli as illustrated in Figure 2A. All known methods of increasing osmolality of growth media are appropriate for use 
herein including addition of salts such as sodium chloride, KCI, MgCfe and the like, and sugars such as sucrose, 
glucose, maltose, etc. and polymers such as polyethylene glycol (PEG), dextran, cellulose, etc. and amino acids such 

5 as glycine. Increasing the osmolality of growth media results in greater intracellular concentration of amino acid analog 
(s) and a higher degree of complexation of amino acid analog(s) to tRNA. As a consequence, proteins produced by 
the cell achieve a higher degree of incorporation of amino acid analogs. Figure 1 2 illustrates percentage of incorporation 
of proline and hydroxyproline into MBP under isotonic and hypertonic media conditions in comparison to proline in 
native MBP. Thus, manipulating osmolality, in addition to adjusting concentration of amino acid analog(s) in growth 

10 media allows a dual-faceted approach to regulating their uptake into prokaryotic cells and eukaryotic cells as described 
above and consequent incorporation into target polypeptides. 

[0167] Any growth media can be used herein including commercially available growth media such as M9 minimal 
medium (available from Gibco Life Technologies, Inc.), LB medium, NZCYM medium, terrific broth, SOB medium and 
others that are well known in the art. 

»5 [0168] Collagen from different tissues can contain different amounts of frans-4-hydroxyproline. For example, tissues 
that require greater strength such as bone contain a higher number of trans-4-hydroxyproline residues than collagen 
in tissues requiring less strength, e.g., skin. The present system provides a method of adjusting the amount of trans- 
4-hydroxyproline in collagen, collagen fragments, collagen-like peptides, and chimeric peptides having a collagen do- 
main, collagen fragment domain or collagen-like peptide domain fused to a physiologically active domain, since by 

20 increasing or decreasing the concentration of frans-4-hydroxyproline in growth media, the amount of trans-4-hydrox- 
yproline incorporated into such polypeptides is increased or decreased accordingly. The collagen, collagen fragments, 
collagen-like peptides and above-chimeric peptides can be expressed with predetermined levels of frans-4-hydroxy- 
proline. In this manner physical characteristics of an extracellular matrix can be adjusted based upon requirements of 
end use. Without wishing to be bound by any particular theory, it is believed that incorporation of frans-4-hydroxyproline 

25 into the EMP moieties herein provides a basis for self aggregation as described herein. 

[0169] In another aspect, the combination of incorporation of trans-4-hydroxyproline into collagen and fragments 
thereof using hyperosmotic media and genes which have been altered such that codon usage more closely reflects 
that found in E. coli, but retaining the amino acid sequence found in native human collagen, surprisingly resulted in 
production by E. coli of human collagen and fragments thereof which were capable of self aggregation. 

30 [0170] The human collagen Type I (a,) gene sequence (Figure 27A-27E) (SEQ. ID. NO. 1 5) contains a large number 
of glycine and proline codons (347 glycine and 240 proline codons) arranged in a highly repetitive manner. Table I 
below is a codon frequency tabulation for the human Type I (a-,) collagen gene. Of particular note is that the GGA 
glycine codon occurs 64 times and the CCC codon for proline occurs 93 times. Both of these codons are considered 
to be rare codons in E. coli. See. Sharp, P.M. and W.-H. Li. Nucleic Acids Res. 14: 7737-7749, 1986. These, and similar 

35 considerations for other human collagen genes are shown herein to account for the difficulty in expressing human 
collagen genes in E. coli. 
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[0171] In a first step, the sequence of the heterologous collagen gene is changed to reflect the codon bias in E. coli 
as given in codon usage tables (e.g. Ausubel et al., (1995) Current Protocols in Molecular Biology, John Wiley & Sons, 
New York, New York; Wada et al., 1992, supra). Rare £. coli codons (See, Sharp, P.M. and W.-H. Li. Nucleic Acids 
Res. 14: 7737-7749, 1986) are avoided. Second, unique restriction enzyme sites are chosen that are located approx- 
imately every 120-150 base pairs in the sequence. In certain cases this entails altering the nucleotide sequence but 
does not change the amino acid sequence. Third, oligos of approximately 80 nucleotides are synthesized such that 
when two such oligos are annealed together and extended with a DNA polymerase they reconstruct a approximately 
120-150 base pair section of the gene (Figure 28). The section of the gene encoding the very amino terminal portion 
of the protein has an initiating methionine (ATG) codon at the 5' end and a unique restriction site followed by a stop 
(TAAT) signal at the 3' end. The remaining sections have unique restriction sites at the 5' end and unique restriction 
sites followed by a TAAT stop signal the 3' end. The gene is assembled by sequential addition of each section to the 
preceding 5" section. In this manner, each successively larger section can be independently constructed and expressed. 
Figure 28 is a schematic representation of the construction of the human collagen gene starting from synthetic oligos. 
[01 72] A fragment of the human Type I a1 collagen chain fused to the C-terminus of glutathione S-transferase (GST- 
D4, Fig. 29) (SEQ. ID. NO. 18) was prepared and tested for expression in E. coli strain JM109 (F-) under conditions 
of hyperosmotic shock. The collagen fragment included the C-terminal 1 93 amino acids of the triple helical region and 
the 26 amino acid C-terminal telopeptide. Fig. 29 is a schematic of the amino acid sequence of the GST-ColECol (SEQ. 
ID. NO. 17) and GST-D4 (SEQ. ID. NO. 18) fusion proteins. ColECol comprises the 17 amino acid N-terminal telopep- 
tide, 338 Gly-X-Y repeating tripeptides, and the 26 amino acid C-terminal telopeptide. There is a unique methionine 
at the junction of GST and D4, followed by 64 Gly-X-Y repeats, and the 26 amino acid telopeptide. The residue (Phel99) 
in the C-terminal telopeptide of D4 where pepsin cleaves is indicated. The gene was synthesized for the collagen 
fragmentfrom synthetic oligonucleotides designed to reflect optimal E. coli usage. Fig. 30 is a table depicting occurrence 
of the four proline and four glycine codons in the human Type I o1 gene (HCol) and the Type I pt1 gene with optimized 
E. coli codon usage (ColECol). Usage of the remaining codons in ColECol was also optimized for E. coli expression 
according to Wada et al., supra. Protein GST-D4 was efficiently expressed in JM109 (F-) in minimal media lacking 
proline but supplemented with Hyp and Nad (See Figs. 31 and 32). Expression was dependent on induction with 
isopropyH-thio-B-galactopyranoside (IPTG), <rans-4-hydroxyproline and NaCI. At a fixed Nad concentration of 500 
mM, expression was minimal at frans-4-hydroxyproline concentrations below ~20 mM while the expression level pla- 
teaued at frans-4-hydroxyproline concentrations above 40 mM. See Fig. 31 which depicts a gel showing expression 
and dependence of expression of GST-D4 on hydroxyproline. The concentration of hydroxyproline is indicated above 
each lane. Osmolyte (NaCI) was added at 500 mM in each culture and each was induced with 1 .5 mM IPTG. The arrow 
marks the position of GST-D4. Likewise, at a fixed f/an*4-hydroxyproline concentration of 40 mM, NaCI concentrations 
below 300 mM resulted in little protein accumulation and expression decreased above 700-800 mM NaCI. See Fig. 32 
which depicts a gel showing expression of GST-D4 in hyperosmotic media. Lanes 2 and 3 are uninduced and induced 
samples, respectively, each without added osmolyte. The identity and quantity of osmolyte is indicated above each of 
the other lanes. Trans-4-Hydroxyproline was added at40mM in each culture and all cultures except that in lane 1 were 
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induced with 1 .5 mM IPTG. The arrow marks the position of GST-D4. 

[0173] Either sucrose or KC1 can be substituted for NaC1 as the osmolyte (See Fig. 32). Thus, the osmotic shock- 
mediated intracellular accumulation of irans-4-hydroxyproline was a critical determinant of expression rather than the 
precise chemical identity of the osmolyte. Despite the large number of prolines (66) in GST-D4, its size (46 kDA), and 

5 non-optimal growth conditions, it was expressed at ~ 1 0% of the total cellular protein. Expressed proteins of less than 
full-length indicative of aborted transcription, translation, or mRNA instability were not detected. 
[0174J The gene for protein D4 contains 52 proline codons. In the expression experiments reflected in Figs. 31 and 
32, it was expected that trans-4-hydroxyproline would be inserted at each of these codons resulting in a protein where 
frans-4-hydroxyproline had been substituted for all prolines. To confirm this, GST-D4 was cleaved with BrCN in 0.1 N 

io HC1 at methionines within GST and at the unique methionine at the N-terminal end of D4, and D4 purified by reverse 
phase HPLC. Crude GST-D4 was dissolved in 0.1 M HC1 in a round bottom flask with stirring. Following addition of a 
2-10 fold molar excess of clear, crystalline BrCN, the flask was evacuated and filled with nitrogen. Cleavage was 
allowed to proceed for 24 hours, at which time the solvent was removed in vacuo. The residue was dissolved in 0.1% 
trifluoroacetic acid (TFA) and purified by reverse-phase HPLC using a Vydac C4 RP-HPLC column (10 x 250 mm, 5 

15 u, 300 A) on a BioCad Sprint system (Perceptive Biosystems, Framingham, MA). D4 was eluted with a gradient of 15 
to 40% acetonitrile/0. 1 % TFA over a 45 min. period. D4 eluted as a single peak at 26% acetonitrile/0.1 % TFA. Standard 
BrCN cleavage conditions (70% formic acid) resulted in extensive formylation of D4, presumably at the hydroxyl groups 
of the frans-4-hydroxyproline residues. Formylation of BrCN/formic acid-cleaved proteins had been noted before (Bea- 
vis et al., Anal. Chem., 62, 1836 (1990)). Amino acid analysis was carried out on a Beckman ion exchange instrument 

20 with post-column derivatization. N-terminal sequencing was performed on an Applied Biosystems sequencer equipped 
with an on-line HLPC system. Electrospray mass spectra were obtained with a VG Biotech BIO-Q quadropole analyzer 
by M-Scan, Inc. (West Chester, PA). For CD thermal melts, the temperature was raised in 0.5°C increments from 4°C 
to 85°C with a four minute equilibration between steps. Data were recorded at 221 .5 nm. The thermal transition was 
calculated using the program ThermoDyne (MORE). The electrospray mass spectroscopy of this protein gave a single 

25 molecular ion corresponding to a mass of 20,807 Da. This mass is within 0.05% of that expected for D4 if it contains 
100% fra/?s-4-hydroxyproline in lieu of proline. Proline was not detected in amino acid analysis of purified D4, again 
consistent with complete substitution of frans-4-hydroxyproline for proline. To confirm further that frans-4-hydroxypro- 
line substitution had only occurred at proline codons, the N-terminal 13 amino acids of D4 was sequenced as above. 
The first 13 codons of D4 specify the protein sequence h^N-Gly-Pro-Pro-Gly-Leu-Ala-Gly-Pro-Pro-Gly-Glu-Ser-Gly 

30 (SEQ. ID. NO. 41 ), The sequence found was H2N-Gly-Hyp-Hyp-Gly-Leu-Ala-Gly-Hyp-Hyp-Gly-Glu-Ser-Gly (SEQ. ID. 
NO. 42), see Fig. 69. Taken together, these results indicate that f/ans-4-hydroxyproline (Hyp) was inserted only at 
proline codons and that the fidelity of the E. coli translational machinery was not otherwise altered by either the high 
intracellular concentration or frans-4-bydroxyproline or hyperosmotic culture conditions. 

[0175] To determine whether D4, containing frans-4-hydroxyproline in both the X and Y positions, forms homotrimeric 

35 helices and to compare stability to native collagen, the following was noted: In neutral pH phosphate buffer, D4 exhibits 
a circular dichroism (CD) spectrum characteristic of a triple helix (See Fig. 33 and Bhatnagar et al., Circular Dichroism 
and the Conformational Analysis of Biomolecules, G.D. Fasman, Ed. Plenum Press, New York, (1996 p. 183). Fig. 33 
illustrates circular dichroism spectra of native and heat-denatured D4 in neutral phosphate buffer. HPLC-purified D4 
was dissolved in 0.1 M sodium phosphate, pH 7.0, to a final concentration of 1 mg/mL (E 280 =3628 M- 1 -cnr 1 ). The 

40 solution was incubated at 4°C for two days to allow triple helices to form prior to analysis. Spectra were obtained on 
an Aviv model 62DS spectropolarimeter (Yale University, Molecular Biophysics and Biochemistry Department). A 1 
mm path length quartz suprasil fluorimeter cell was used. Following a 10 min. incubation period at 4°C, standard 
wavelength spectra were recorded from 260 to 190 nm using 10 sec acquisition times and 0.5 nm scan steps. This 
spectrum is characterized by a negative ellipticity at 198 nm and a positive ellipticity at 221 nm. The magnitudes of 

45 both of these absorbances was greater in neutral pH buffer compared to acidic conditions. Comparable dependence 
of stability on pH has been noted for collagen-like triple helices. See, e.g., Venugopal et al.. Biochemistry, 33, 7948 
(1 994). Heating at 85°C for five minutes prior to obtaining the CD spectrum decreased the magnitude of the absorbance 
at 1 98 nm and abolished the absorbance at 221 nm (Fig. 33). This behavior is also typical of the triple helical structure 
of collagen. See, R.S. Bhatnagar et al., Circular Dichroism and the Conformational Analysis of Biomolecules G.D. 

50 Fasman, Ed., supra. A thermal melt profile of D4 conducted as above in phosphate buffer gave a melting temperature 
of about 29°C. A fragment of the C-terminal region of the bovine Type I crt collagen chain comparable in length to D4 
forms homotrimeric helices with a melting temperature of 26°C. (See, A. Rossi, et al.. Biochemistry 35, 6048 (1996)). 
[0176] Resistance to pepsin digestion is a second commonly used indication of triple helical structure. At 4°C, the 
majority of D4 is digested rapidly by pepsin to a protein of slightly lower molecular weight Fig. 34 is a gel illustrating 

55 the result of digestion of D4 with bovine pepsin. Purified D4 was dissolved in 0.1 M sodium phosphate, pH 7.0, to 1.6 
ng/uJ and incubated at 4°C for 7 days. Aliquots (1 0 uJ) were placed into 1 .5 ml centrifuge tubes and adjusted with water 
and 1 M acetic acid solutions to 25 ui final volume and 200 mM final acetic acid concentration. Each tube was then 
incubated for 20 min. at the indicated temperature and pepsin (0.5 ul of a 0.25 ng/ul solution) was added to each tube 
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and digestion allowed to proceed for 45 minutes. Following digestion, samples were quenched with loading buffer and 
analyzed by SDS-PAGE. However, the initial pepsin cleavage product is resistant to further digestion up to ~30°C. 
Amino terminal sequencing as above of the initial pepsin cleavage product showed that the N-terminus was identical 
to that of full-length D4. Mass spectral analysis as above of the digestion product gave a parent ion with a molecular 

5 weight consistent with cleavage in the C-terminal telopeptide on the N-terminal side of Phe119 (See Fig. 29) suggesting 
that this portion of the protein is either globular or of ill-defined structure and rapidly cleaved by pepsin while the triple 
helical region is resistant to digestion. Thus, despite global f/ans-4-hydroxyproline for proline substitution in both the 
X and Y positions, D4 formed triple helices of stability similar to comparably sized fragments of bovine collagen con- 
taining Hyp at the normal percentage and only in the Y position. 

10 [0177] The fulWength human Type I a1 collagen chain, although more than fourtjmes the size of D4, also expressed 
as a N-terminal fusion with GST (GST-ColECol, Fig. 29) in JM109(F) in Hyp/NaCI media. Fig. 35 is a gel depicting 
expression of GST-HCol and GST-ColECol. Trans-4-hydroxyproline was added at 40 mM and NaCI at 500 mM. Ex- 
pression was induced with 1 .5 mM IPTG. The arrow marks the position of GST-ColECol. In the procedures resulting 
in the gels shown in Figs. 31 , 32 and 35, five ml cultures of JM109 (F-) harboring the expression plasmid in LB media 

<5 containing 1 00 ug/ml ampicillin were grown overnight Cultures were centrifuged and the cell pellets washed twice with 
five ml of M9/Amp media (See, J. Sambrook, E.F. Fritsch, T. Maniatis, Molecular Cloning: A Laboratory Manual. (Cold 
Spring Harbor Laboratory, Cold Spring Harbor, NY, 1 989)) supplemented with 0.5% glucose and 1 00 ug/ml of all amino 
acids except glycine and alanine which were at 200 ug/ml and containing no proline. The cells were finally resuspended 
in five ml of the above media. Following incubation at 37°C for 30 min.. hydroxyproline, osmolyte, or IPTG were added 

20 as indicated. After four hours, aliquots of the cultures were analyzed by SDS-PAGE. 

[0178] Like D4, the gene for protein ColECol was constructed from synthetic oligonucleotides designed to mimic 
codon usage in highly-expressed E. coli genes. In contrast to GST-ColECol, expression from a GST-human Type I ct1 
gene fusion (pHCol) identical to GST-ColECol in coded amino acid sequence but containing the human codon distri- 
bution could not be detected in Coomassie blue-stained SDS-PAGE gels of total cell lysates of induced JM109 (F-y 

25 pHCol cultures (Fig. 35). The gene for the Type I cc1 collagen polypeptide was cloned by polymerase chain reaction 
of the gene from mRNA isolated from human foreskin cells (HS27, ATCC 1 634) with primers designed from the pub- 
lished gene sequence (GenBank Z74615). The 5' primer added a flanking EcoR I recognition site and the 3' primer a 
flanking Hind III recognition site. The gene was cloned into the EcoR I/Hind III site of plasmid pBSKS* (Stratagene, La 
Jolla, CA), four mutations corrected using the ExSite mutagenesis kit (Stratagene, La Jolla, CA), the sequence con- 

30 firmed by dideoxy sequencing, and finally the EcoR l/Xho I fragment subcloned into plasmid pGEX-4T.1 (Pharmacia, 
Piscataway, NJ). The GST-HCol gene is expression-competent because a protein of the same molecular weight as 
GST-ColECol is detected when immunoblots of total cell lysates are probed with an anti-Type I collagen antibody. Thus, 
sequence or structural differences between the genes for ColECol and HCol are critical determinants of expression 
efficiency in E. coli. This is likely due to the codon distribution in these genes and ultimately to differences in tRNA 

35 isoacceptor levels in E. coli compared to humans. GST-ColECol, GST-D4, and GST-HCol do not accumulate in hyper- 
osmotic shock media when proline is substituted for hydroxyproline or in rich media. A possible explanation is that the 
frans-4-hydroxyproline-containing proteins may be resistant to degradation because they fold into a protease-resistant 
triple helix while the proline-containing proteins do not adopt this structure. The large number of codons non-optimal 
for E. coli found in the human gene and the instability of proline-containing collagen in E. coli may, in part, explain why 

40 expression of human collagen in E. coli has not been previously reported. 

[0179] As discussed above, collagen mimetic polypeptides, i.e., engineered polypeptides having certain composi- 
tional and structural traits in common with collagen are also provided herein. Such collagen mimetic polypeptides may 
also be made to incorporate amino acid analogs as described above. GST-CM4 consists of glutathione S-transferase 
fused to 30 repeats of a Gly-X-Y sequence. The Gly-X-Y repeating section mimics the Gly-X-Y repeating unit of human 

45 collagen and is referred to as collagen mimetic 4 or CM4 herein. Thus, the hydroxyproline-incorpdrating technology 
was also demonstrated to work with a protein and DNA sequence analogous to that found in human collagen. Amino 
acid analysis of purified CM4 protein express inE. coli strain JM109 (F-) under hydroxyproline-incorporating conditions 
compared to analysis of the same protein expressed under proline-incorporating conditions, demonstrates that the 
techniques herein result in essentially complete substitution of hydroxyproline for proline. The amino acid analysis was 

50 performed on CM4 protein that had been cleaved from and purified away from GST. This removes any possible am- 
biguities associated with the fusion protein. 

[0180] " Expression in media containing at least about 200 mM NaCI is.preferable to accumulate significant amount 
of protein containing hydroxyproline. A concentration of about 400-500 mM Nad appears to be optimal. Either KCI, 
sucrose or combinations thereof may be used in substitution of or with NaCI. However, expression in media without 
55 an added osmolyte (i.e. under conditions that more closely mimic those of Deming et al., In Vivo Incorporation of Proline 
Analogs into Artificial Protein, Poly. Mater. Sci. Engin. Proceed., supra.) did not result in significant expression of hy- 
droxyproline-containing proteins in JM109 (F-). This is illustrated in Figure 36 which is a scan of a SDS-PAGE gel 
showing the expression of GST-CM4 in media with or without 500 mM Nad and containing either proline or hydroxy- 
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proline. The SDS-PAGE gel reflects 5 hour post-induction samples of GST-CM4 expressed in JM109 (F-). Equivalent 
amounts, based on OD600nm, of each culture were loaded in each lane. Gels were stained with Coomasie Blue, 
destained, and scanned on a PDI 420oe scanner. Lane 1 : 2.5mM proline/OmM NaCI. Lane 2: 2.5mM proline/500mM 
NaCI. Lane 3: 80mM hydroxyproline/OmM NaCI. Lane 4: 80mM hydroxyproline/500mM NaCI. Lane 5: Molecular weight 

5 markers. The lower arrow indicates the migration position of proline-containing GST-CM4 in lanes 1 and 2. The upper 
arrow indicates the migration position of hydroxyproline-containing GST-CM4 in lanes 3 and 4. Note that GST-CM4 
expressed in the presence of hydroxyproline runs at a higher apparent molecular weight (compare lanes 1 and 4). This 
is expected since hydroxyproline is of greater molecular weight than proline. If all the prolines in GST-CM4 are substi- 
tuted with hydroxyproline, the increase in molecular weight is 671 Da (+2%). Note also that protein expressed in the 

10 presence of proline accumulates in cultures irrespective of the NaCI concentration (compare lanes 1 and 2). In contrast, 
significant expression in the presence of hydroxyproline only occurs in the culture containing 500 mM NaCI (compare 
lanes 3 and 4). Figure 37 further illustrates the dependence of expression on Nad concentration by showing that 
significant expression of GST-CM4 occurs only at Nad concentration greater than 200 mM. The SDS-PAGE gel reflects 
6 hour post-induction samples of GST-CM4 expressed in JM109 (F-) with varying concentrations of NaCI. All cultures 

15 contained 80 mM hydroxyproline. Lane 1 : 500 mM NaCI, not induced. Lanes 2-6: 500 mM, 400 mM, 300 mM, 200 mM, 
and 100 mM NaCI, respectively. All induced with 1 .5 mM IPTG. Lane 7: Molecular weight markers. The arrow indicates 
the migration position of hydroxyproline-containing GST-CM4. Figure 38 is a scan of an SDS-PAGE gel of expression 
of GST-CM4 in either 400 mM NaCI or 800 mM sucrose. The SDS-PAGE gel reflects 4 hour post-induction samples 
of GST-CM4 expressed in JM1 09 (F-). All cultures contained 80 mM hydroxyproline and all, except that electrophoresed 

20 in lane 2, contained 400 mM NaCI. Lane 2 demonstrates expression in sucrose in lieu of NaCI. Lane 1: Molecular 
weight markers. Lane 2: 800 mM sucrose (no NaCI). Lanes 3-9: 0 mM, 0.025 mM, 0.1 mM, 0.4 mM, 0.8 mM, 1 .25 mM, 
2.5 mM proline, respectively. The upper arrow indicates the migration position of hydroxyproline-containing GST-CM4 
and the lower arrow indicates the migration position of proline-containing GST-CM4. Expression is apparent in both 
cases (compare lanes 2 and 3). 

25 (018.1] If expression of GST-CM4, as described in Example 17 below, is performed in varying ratios of hydroxyproline 
and proline the expressed protein appears to contain varying amounts of hydroxyproline. Thus, if only hydroxyproline 
is present during expression, a single expressed protein of the expected molecular weight is evident on a SDS-PAGE 
gel (Figure 38, lane 3). If greater than approximately 1 mM proline is present, again a single expressed protein is 
evident, but at a lower apparent molecular weight, as expected for the protein containing only proline (Figure 38, lanes 

30 7-9). If lesser amount of proline are used during expression, species of apparent molecular weight intermediate between 
these extremes are evident. This phenomenon, evident as a "smear" or "ladder" of proteins running between the two 
molecular weight extremes on an SDS-PAGE gel, is illustrated in lanes 3-9 of Figure 38. Lanes 3-9 on this gel are 
proteins from expression in a fixed concentration of 80 mM hydroxyproline and 400 mM NaCI. However, in moving 
from lane 3 to 9 the proline concentration increases from none (lane 3) to 2.5 mM (lane 9) and expression shifts from 

35 a protein of higher molecular weight (hydroxyproline-containing GST-CM4) to lower molecular weight (proline-contain- 
ing GST-CM4). At proline concentrations of 0.025 mM and 0.1 mM, species of intermediate molecular weight are 
apparent (lanes 4 and 5). This clearly demonstrates that the percent incorporation of hydroxyproline in an expressed 
protein can be controlled by expression in varying ratios of analogue to amino acid. 

[0182] Proline starvation prior to hydroxyproline incorporation is an important technique used herein. It insures that 
40 no residual proline is present during expression to compete with hydroxyproline. This enables essentially 100% sub- 
stitution with the analogue. As shown in Figure 38, starvation conditions allow expression under precisely controlled 
ratios of proline and hydroxyproline. The amount of hydroxyproline vs. proline incorporated into the recombinant protein 
can therefore be controlled. Thus, particular properties of the recombinant protein that depend upon the relative amount 
of analogue incorporated can be tailored by the present methodology to produce polypeptides with unique and bene- 
45 ficial properties. 

[0183] Human collagen, collagen fragments, collagen-like peptides (collagen mimetics) and the above chimeric 
polypeptides produced by recombinant processes have distinct advantages over collagen and its derivatives obtained 
from non-human animals. Since the human gene is used, the collagen will not act as a xenograft in the context of a 
medical implant. Moreover, unlike naturally occurring collagen, the extent of proline hydroxylation can be predeter- 
50 mined. This unprecedented degree of control permits detailed investigation of the contribution of frans-4-hydroxyproline 
to triple helix stabilization, fibril formation and biological activity. In addition, design of medical implants based upon 
the desired strength of collagen fibrils is enabled. 

[0184] The following examples are included for purposes of illustration and are not to be construed as limitations 
herein. 

55 
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EXAMPLE 1 

Trans-membrane Transport 

5 [0185] A 5 mL culture of E. coli strain DH5a (supE44 A/acU169 {Q&OlacZ AM15) /)sdfM7 recA1 en^ gyrA96 <W-1 
relA\ ) containing a plasmid conferring resistance to ampicillin (pMAL-c2, Fig. 1 ) was grown in Luria Broth to confluency 
(—16 hours from inoculation). These cells were used to inoculate a 1 L shaker flask containing 500 mL of M9 minimal 
medium (M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100 ug/mL ampicillin supplemented with all amino acids at 20 
ug/mL) which was grown to an AUgoo of 1 .0 (18-20 hours). The cuiture was divided in half and the cells harvested by 

10 . centrifugation. The cells from one culture, were resuspended in 250 mL M9 media and those from the other in 250 mL 
of M9 media containing 0.5M NaCI. The cultures were equilibrated in an air shaker for 20 minutes at 37 °C (225 rpm) 
and divided into ten 25 mL aliquots. The cultures were returned to the shaker and 125 ul of 1M hydroxyproline in 
distilled H 2 0 was added to each tube. At 2, 4, 8, 12, and 20 minutes, 4 culture tubes (2 isotonic, 2 hypertonic) were 
vacuum filtered onto 1 um polycarbonate filters that were immediately placed into 2 mL microfuge tubes containing 

is 1.2 mL of 0.2M NaOH/2% SDS in distilled H 2 0. After overnight lysis, the filters were carefully removed from the tubes, 
and the supernatant buffer was assayed for hydroxyproline according to the method of Grant, Journal of Clinical Pa- 
thology, 17:685 (1964). The intracellular concentration of frans-4-hydroxyproline versus time is illustrated graphically 
in Figure 2. 

20 EXAMPLE 2 

Effects of Salt Concentration on Transmembrane Transport 

[0186] To determine the effects of salt concentration on transmembrane transport, an approach similar to Example 
25 1 was taken. A 5 mL culture of coli strain DH5a (supEA4 A/acU169 (ij>80/acZAM15) hsdRW recA\ ental gyrA96 f/w-1 
reM1 ) containing a plasmid conferring resistance to ampicillin (pMAL-c2, Fig. 1 ) was grown in Luria Broth to confluency 
(~16 hours from inoculation). These cells were used to inoculate a 1 L shaker flask containing 500 mL of M9 minimal 
medium (M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100 ug/mL ampicillin supplemented with all amino acids at 20 
ug/mL) that was then grown to an AU 600 of 0.6. The culture was divided into three equal parts, the cells in each collected 
30 by centrifugation and resuspended in 150 mL M9 media, 150 mL M9 media containing 0.5M NaCI, and 150 mL M9 
media containing 1 .0M NaCI, respectively. The cultures were equilibrated for 20 minutes on a shaker at 37° C (225rpm) 
and then divided into six 25 mL aliquots. The cultures were returned to the shaker and 125 ul. of 1M hydroxyproline 
in distilled H 2 0 was added to each tube. At 5 and 15 minutes, 9 culture tubes (3 isotonic, 3 x 0.5M NaCI, and 3 x 1.0M 
NaCI) were vacuum filtered onto 1 um polycarbonate filters that were immediately placed into 2 mL microfuge tubes 
35 containing 1 .2 mL of 0.2M NaOH/2% SDS in distilled H2O. After overnight lysis, the filters were removed from the tubes 
and the supernatant buffer assayed for hydroxyproline according to the method of Grant, supra. 

EXAMPLE 2A 

40 Effects of Salt Concentration on Transmembrane Transport 

[0187] To determine the effects of salt concentration on transmembrane transport, an approach similar to Example 
1 was taken. A saturated culture of JM109 (F-) harboring plasmid pD4 (Fig. 48) growing in Luria Broth (LB) containing 
100ug/ml ampicillin (Amp) was used to inoculate 20 ml cultures of LB/Amp to an OD at 600 nm of 0.1 AU. The cultures 

45 were grown with shaking at 37°C to an OD 600 nm between 0.7 and 1.0 AU. Cells were collected by centrifugation 
and washed with 10 ml of M9 media. Each cell pellet was resuspended in 20 ml of M9/Amp media supplemented with 
0.5% glucose and 1 0Oug/ml of all of the amino acids except proline. Cultures were grown at 37°C for 30 min. to deplete 
endogenous proline. After out-growth, Nad was added to the indicated concentration, Hyp was added to 40mM, and 
IPTG to i.5mM. After 3 hours at 37°C, cells from three 5 ml aliquots of each culture were collected separately on 

50 polycarbonate filters and washed twice with five ml of M9 media containing 0.5% glucose and the appropriate concen- 
tration of NaCI. Cells were lysed in 1 ml of 70% ethanol by vortexing for 30 min. at room temperature. Cell lysis super- 
natants were taken to dryness, resuspended in 100ul of 2.5 N NaOH, and assayed for Hyp by the method of Neuman 
and Logan, R.E. Neuman and M.A. Logan, Journal of Biological Chemistry, 184:299 (1950). Total protein was deter- 
mined with the BCA kit (Pierce, Rockford II) after cell lysis by three sonication/freeze-thaw cycles. The data are the 

55 mean ± standard error of three separate experiments. The intracellular concentration of frans-4-hydroxyproline versus 
NaCI concentration is illustrated graphically in Figure 2A. 
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EXAMPLE 3 

Determination Of Proline Starvation Conditions in E. Coli 

5 [0188] Proline auxotrophic E. coli strain NM519 (pro-) including plasmid pMAL-c2 which confers ampicillin resistance 
was grown in M9 minimal medium (M9 salts. 2% glucose. 0.01 mg/mL thiamine. 100ug mL ampicillin supplemented 
with all amino acids at 20 jig/mL except proline which was supplemented at 12.5 mg/L) to a constant AUeoo of 0.53 
■ AU (1 7 hours post-inoculation). Hydroxyproline was added to 0.08M and hydroxyproline-dependent growth was dem- 
onstrated by the increase in the OD^o to 0.61 AU over a one hour period. 

10 

EXAMPLE 4 

Hydroxyproline Incorporation Into Protein in E. coli Under Proline Starvation Conditions 

15 [01 89] Plasmid pMAL-c2 (commercially available from New England Biolabs) containing DNA encoding for maltose- 
binding protein (MBP) was used to transform proline auxotrophic E. coli strain NM51 9 (p«r). Two 1 L cultures of trans- 
formed NM519 (pro-) in M9 minimal medium (M9 salts. 2% glucose, 0.01 mg/mL thiamine, 100 jig/mL ampicillin sup- 
plemented with all amino acids at 20 ug/mL except proline which was supplemented at 12.5 mg/L) were grown to an 
AUgoo Of 0.53 (-17 hours post-inoculation). The cells were harvested by centrifugation, the media in one culture was 

20 replaced with an equal volume of M9 media containing 0.08M hydroxyproline and the media in the second culture was 
replaced with an equal volume of M9 media containing 0.08M hydroxyproline and 0.5M NaCI. After a one hour equili- 
bration, the cultures were induced with 1 mM isopropyl-p-D-thiogalactopyranoside. After growing for an additional 3.25 
hours, cells were harvested by centrifugation, resuspended in 10 mL of 10mM Tris-HCI (pH 8), 1mM EDTA, 100mM 
NaCI (TEN buffer), and lysed by freezing and sonication. MBP was purified by passing the lysates over 4 mL amylose 

25 resin spin columns, washing the columns with 10 mL of TEN buffer, followed by elution of bound MBP with 2 mL of 
TEN buffer containing 10mM maltose. Eluted samples were sealed in ampules under nitrogen with an equal volume 
of concentrated HCI (11 .7M) and hydrolysed for 12 hours at 120 °C. After clarification with activated charcoal, hydrox- 
yproline content in the samples was determined by HPLC and the method of Grant, supra. The percent incorporation 
of rrans-4-hydroxyproline compared to proline into MBP is shown graphically in Figure 12. 

30 

EXAMPLE 5 

Hydroxyproline Incorporation Into Protein in S. cerevisiae via Integrating Vectors Under Proline Starvation Conditions 

35 [01 90] The procedure described in Example 4 above is performed in yeast using an integrating vector which disrupts 
the proline biosynthetic pathway. A gene encoding human Type 1(0^) collagen is inserted into a unique shuttle vector 
behind the inducible GAL10 promoter. This promoter/gene cassette is flanked by a 5' and 3' terminal sequence derived 
from a S. cerevisiae proline synthetase gene. The plasmid is linearized by restriction digestion in both the 5' and 3' 
terminal regions and used to transform a proline-prototrophic S. cerevisiae strain. The transformation mixture is plated 

40 onto selectable media and transformants are selected. By homologous recombination and gene disruption, the con- 
struct simultaneously forms a stable integration and converts the S. cerevisiae strain into a proline auxotroph. A single 
transformant is selected and grown at 30 °C in YPD media to an OD^o of 2 AU. The culture is centrifuged and the 
cells resuspended in yeast dropout media supplemented with all amino acids except proline and grown to a constant 
ODgoo indicating proline starvation conditions. 0.08M L-hydroxyproline and 2% (w/v) galactose is then added. Cultures 

45 are grown for an additional 6-48 hours. Cells are harvested by centrifugation (5000 rpm, 10 minutes) and lysed by 
mechanical disruption. Hydroxyproline-containing human Type 1(a-,) collagen is purified by ammonium sulfate frac- 
tionation and column chromatography. 

EXAMPLE 6 

50 

Hydroxyproline Incorporation Into Protein in S. cerevisiae via Non-Integrating Vectors Under Proline Starvation 
. Conditions 

[0191] The procedure described above in Example 4 is performed in a yeast proline auxotroph using a non-integrating 
55 vector. A gene encoding human Type 1 (a,) collagen is inserted behind the inducible GAL10 promoter in the YEp24 
shuttle vector that contains the selectable Ura* marker. The resulting plasmid is transformed into proline auxotrophic 
S. cerevisiae by spheroplast transformation. The transformation mixture is plated on selectable media and transform- 
ants are selected. A single transformant is grown at 30 °C in YPD media to an OD^o of 2 AU. The culture is centrifuged 
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and the cells resuspended in yeast dropout media supplemented with all amino acids except proline and grown to a 
constant ODeoo indicating proline starvation conditions. 0.08M L-hydroxyproline and 2% (w/v) galactose is then added. 
Cultures are grown for an additional 6-48 hours. Cells are harvested by centrifugation (5000 rpm, 10 minutes) and 
lysed by mechanical disruption. Hydroxyproline-containing human Type 1 (a^ collagen is purified by ammonium sulfate 
5 fractionation and column chromatography. 

EXAMPLE 7 

Hydroxyproline Incorporation Into Protein in a Baculovirus Expression System 

[0192] A gene encoding human Type 1(0^) collagen is inserted into the pBacPAK8 baculovirus expression vector 
behind the AcMNPV polyhedron promoter. This construct is co-transfected into SF9 cells along with linearized AcMNPV 
DNA by standard calcium phosphate co-precipitation. Transfectants are cultured for 4 days at 27 °C in TNM-FH media 
supplemented with 10 % FBS. The media is harvested and recombinant virus particles are isolated by a plaque assay. 
15 Recombinant virus is used to infect 1 liter of SF9 cells growing in Grace's media minus proline supplemented with 1 0% 
FBS and 0.08 M hydroxyproline. After growth at 27 °C for 2-10 days, cells are harvested by centrifugation and lysed 
by mechanical disruption. . 

Hydroxyproline-containing human Type 1 (a,) collagen is purified by ammonium sulfate fractionation and column chro- 
matography. 

20 

EXAMPLE 8 

Hydroxyproline Incorporation Into Human Collagen Protein in Escherichia coli Under Proline Starvation Conditions 

25 [0193] A plasmid (pHuCol, Fig. 4) encoding the gene sequence of human Type I (a,) collagen (Figures 3A and 3B) 
(SEQ. ID. NO. 1 ) placed behind the isopropyl-B-D-thiogalactopyranoside (IPTG)-inducible tac promoter and also en- 
coding jj-lactamase is transformed into Escherichia coli proline auxotrophic strain NM51 9 (pro-) by standard heat shock 
transformation. Transformation cultures are plated on Luria Broth (LB) containing 100ug/ml ampicillin and after over- 
night growth a single ampicillin-resistant colony is used to inoculate 5 ml of LB containing 100 ug/ml ampicillin. After 

30 growth for 10-16 hours with shaking (225 rpm) at 37 ? C, this culture is used to inoculate 1 L of M9 minimal medium 
(M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100 ug/mL ampicillin, supplemented with all amino acids at 20 ug/mL 
except proline which is supplemented at 12.5 mg/L) in a 1.5 L shaker flask. Aftergrowth at 37 °C, 225 rpm, for 15-20 
hours post-inoculation, the optical density at 600 nm is constant at approximately 0.5 OD/mL. The cells are harvested 
by centrifugation (5000 rpm, 5 minutes), the media decanted, and the cells resuspended in 1 L of M9 minimal media 

35 containing 100 ug/mL ampicillin, 0.08M L-hydroxyproiine, and 0.5M NaCI. Following growth for 1 hour at 37 °C, 225 
rpm, IPTG is added to 1mM and the cultures allowed to grow for an additional 5-15 hours. Cells are harvested by 
centrifugation (5000 rpm, 10 minutes) and lysed by mechanical disruption. Hydroxyproline-containing collagen is pu- 
rified by ammonium sulfate fractionation and column chromatography. 

40 EXAMPLE 9 

Hydroxyproline Incorporation Into Fragments of Human Collagen Protein in Escherichia coli Under Proline Starvation 
Conditions 

45 [0194] A plasmid (pHuCol-FI, Figure 6) encoding the gene sequence of the first 80 amino acids of human Type 1 
(a,) collagen (Figure 5) (SEQ. ID. NO. 2) placed behind the isopropyl-6-D-thiogalactopyranoside (IPTG)-inductble tac 
promoter and also encoding B-lactamase is transformed into Escherichia coli proline auxotrophic strain NM51 9 (prcr) 
by standard heat shock transformation. Transformation cultures are plated on Luria Broth (LB) containing 100 ug/mL 
ampicillin and after overnight growth a single ampicillin-resistant colony is used to inoculate 5 mL of LB containing 100 

50 ug/mL ampicillin. After growth for 10-16 hours with shaking (225 rpm) at 37 °C, this culture is used to inoculate 1 L of 
M9 minimal medium (M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100 ug/mL ampicillin, supplemented with all amino 
acids at 20 ug/mL except proline which is supplemented at 12.5 mg/L) in a 1 .5 L shaker flask. After growth at 37 °C, 
225 rpm, for 15-20 hours post-inoculation, the optical density at 600 nm is constant at approximately 0.5 OD/mL. The 
cells are harvested by centrifugation (5000 rpm, 5 minutes), the media decanted, and the cells resuspended in 1 L of 

55 M9 minimal media containing 100 ug/mL ampicillin, 0.08M L-hydroxyproline, and 0.5M NaCI. Following growth for 1 
hour at 37 °C, 225 rpm, IPTG is added to 1mM and the cultures allowed to grow for an additional 5-15 hours. Cells 
are harvested by centrifugation (5000 rpm, 10 minutes) and lysed by mechanical disruption. The hydroxyproline-con- 
taining collagen fragment is purified by ammonium sulfate fractionation and column chromatography. 
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EXAMPLE 10 

Construction and Expression in E. coli of the Human Collagen Type Ka,) Gene with Optimized E. co// Codon Usage 
5 A. Construction of the gene: 

[0195] The nucleotide sequence of the helical region of human collagen Type I (a t ) gene flanked by 17 amino acids 
of the amino terminal extra-helical and 26 amino acids of the C-terminal extra-helical region is shown in Figure 27 
(SEQ. ID. NO. 15). A tabulation of the codon frequency of this gene is given in Table I. The gene sequence shown in 
to Figure 27 was first changed to reflect E. coli codon bias. An initiating methionine was inserted at the 5* end of the gene 
and a TAAT stop sequence at the 3' end. Unique restriction sites were identified or created approximately every 150 
base pairs. The resulting gene (HUCol^ 0 . Figure 39A-39E) (SEQ. ID. NO. 20) has the codori usage given in Table II 
as shown below. Other sequences that approximate E. coli codon bias are also acceptable. 
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[01 96] Oligos of approximately 80 nucleotides were synthesized on a Beckman Oligo 1 000 DNA synthesizer, cleaved 
and deprotected with aqueous NH 4 OH, and purified by electrophoresis in 7M urea/12% polyacryiamide gels. Each set 
of oligos was designed to have an EcoR I restriction enzyme site at the 5' end, a unique restriction site near the 3' end, 
followed by the TAAT stop sequence and a Hind III restriction enzyme site at the very 3' end. The first four oligos, 
comprising the first 81 amino acids of the human collagen Type I (c^) gene, are given in Figure 40 which shows the 
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sequence and restriction maps of synthetic oligos used to construct the first 243 base pairs of the human Type I (o^) 
collagen gene with optimized E. co// codon usage. Oligos N1-1 (SEQ. ID. NO. 21) and N1-2 (SEQ. ID. NO. 22) were 
designed to insert an initiating methionine (ATG) codon at the 5' end of the gene. 

[0197] In one instance, oligos N1-1 and N1-2 (1ug each) were annealed in 20 uL of T7 DNA polymerase buffer 

5 (40mM Tris:HC1 (pH 8.0), 5mM MgCI 2 , 5mM dithiothreitol. 50mM NaCI, 0.05 mg/mL bovine serum albumin) by heating 
at 90°C for 5 minutes followed by slow cooling to room temperature. After brief centrifugation at 14,000 rpm, 10 units 
of T7 DNA polymerase and 2 uL of a solution of all four dNTPs (dATP, dGTP, dCTP, dTTP, 2.5mM each) were added 
to the annealed oligos. Extension reactions were incubated at 37°C for 30 minutes and then heated at 70°C for 10 
minutes. After cooling to room temperature, Hind III buffer (5 pL of 10x concentration), 20 uL of H 2 0, and 10 units of 

io Hind III restriction enzyme were added and the tubes incubated at 37°C for 10 hours. Hind III buffer (2jiL of 10x con- 
centration), 13.5ul of 0.5M TrisHCI (pH 7.5), 1 .8 uL of 1% Triton X100, 5.6 uL of H 2 0, and 20 U of EcoR I were added 
to each tube and incubation continued for 2 hours at 37°C. Digests were extracted once with an equal volume of phenol, 
once with phenol/chloroform/isoamyl alcohol, and once with chloroform/isoamyl alcohol. After ethanol precipitation, 
the pellet was resuspended in 10 ut of TE buffer (10mM Tris HCI (pH 8.0), 1mM EDTA). Resuspended pellet (4 uL) 

« was ligated overnight at 16°C with agarose gel-purified EcoRI/Hind III digested pBSKS + vector (1ug) using T4 DNA 
ligase (100 units). One half of the transformation mixture was transformed by heat shock into DH5tx cells and 100 uL 
of the 1 .0 mL transformation mixture was plated on Luria Broth (LB) agar plates containing 70 ng/mL ampicillin. Plates 
were incubated overnight at 37°C. Ampicillin resistant colonies (6-12) were picked and grown overnight in LB media 
containing 70 mg/mL ampicillin. Plasmid DNA was isolated from each culture by Wizard Minipreps (Promega Corpo- 

20 ration, Madison Wl) and screened for the presence of the approximately 120 base pair insert by digestion with EcoR 
I and Hind III and running the digestion products on agarose electrophoresis gels. Clones with inserts were confirmed 
by standard dideoxy termination DNA sequencing. The correct clone was named pBSN1 -1 (Figure 41 ) and the collagen 
fragment has the nucleic acid sequence given in Figure 42 (SEQ. ID. NO. 25). 

[0198] Oligos N1-3 (SEQ. ID. NO. 23) and N1-4 (SEQ. ID. NO. 24) (Figure 40) were synthesized, purified, annealed, 
25 extended, and cloned into pBSKS* following the same procedure given above for oligos N1-1 and N1-2. The resulting 
plasmid was named pBSN1-2A. To clone together the sections of the collagen gene from pBSN1-1 and pBSN1-2A, 
plasmid pBSN1-1 (1 ug) was digested for 2 hours at 37°C with Rsr II and Hind III. The digested vector was purified by 
agarose gel electrophoresis. Plasmid pBSN1-2A (3ng) was digested for 2 hours at 37°C with Rsr II and Hind III and 
the insert purified by agarose gel electrophoresis. Rsr ll/Hind Ill-digested pBSN 1-1 was ligated with this insert overnight 
30 at 16°C with T4 DNA ligase. One half of the ligation mixture was transformed into DH5a cells and 1/10 of the trans- 
formation mixture was plated on LB agar plates containing 70 uxj/mL ampicillin. After overnight incubation at 37°C, 
ampicillin-resistant clones were picked and screened for the presence of insert DNA as described above. Clones were 
confirmed by dideoxy termination sequencing. The correct clone was named pBSN1-2 (Figure 43) and the collagen 
fragment has the sequence given in Figure 44. 
35 [0199] In similar manner, the remainder of the collagen gene is constructed such that the final DNA sequence is that 
given in Figure 39A-39E (SEQ. ID. NO. 19). 

B) Expression of the gene in E. coli: 

40 [0200] Following construction of the entire human collagen Type I (a,) gene with codon usage optimized for E. coli, 
the cloned gene is expressed in E. coli. A plasmid (pHuCol Ec , Figure 45) encoding the entire synthetic'collagen gene 
(Figure 39A-39E) placed behind the isopropyl-8-D-thiogalactopyranoside (IPTG)-mducible fac promotor and also en- 
coding B-lactamase is transformed into Escherichia co//'strain DH5a (supE44MacU 169 (^80lacZ AM15) hsdRlJ recAt 
endM gyrA96 tfw-1 re/A1) by standard heat shock transformation. Transformation cultures are plated on Luria Broth 

« (LB) containing 1 00 ug/mL ampicillin and after overnight growth a "single ampicillin-resistant colony is used to inoculate 
10 mL of LB containing 100 u.g/mL ampicillin. After growth for 10-16 hours with shaking (225 rpm) at 37°C, this culture 
is used to inoculate 1 L of LB containing 100 p.g/mL ampicillin in a 1 .5 L shaker flask. After growth at 37°C, 225 rpm, 
for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 OD/mL. IPTG is added to 1mM and the 
culture allowed to grow for an additional 5-10 hours. Cells are harvested by centrifugation (5000 rpm, 10 minutes) and 

50 lysed by mechanical disruption. Recombinant human collagen is purified by ammonium sulfate fractionation and column 
chromatography. The yield is typically 15-25 mg/L of culture. 

EXAMPLE 11 

55 Expression in E. coli of an 81 Amino Acid Fragment of Human Collagen Type I(ct1 ) with Optimized E. coli Codon Usage 

[0201] A plasmid (pTrcN1-2, Figure 46) encoding the gene sequence of the first 81 amino acids of human Type I 
(a-)) collagen with optimized E. coli codon usage cloned in fusion with a 6 histidine tag at the 5* end of the gene and 
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placed behind the isopropyl-pMMhiogalactopyranoside (IPTG)-inducible trc promoter and also encoding JJ-lactamase 
was constructed by subcloning the EcoR I/Hind III insert from pBSN1-2 into the EcoR I/Hind III site of plasmid pTrcB 
(Invitrogen, San Diego, CA). Plasmid pTrcN1-2 was transformed into Escherichia coli strain DH5a (supE44A/acU169 
((((80/ac/Z AM15) hsdRM recA1 endM gyrA96 f/jM re/A1) by standard heat shock transformation. Transformation 

5 cultures were plated on Luria Broth (LB) containing 100 ug/mL ampicillin and after overnight growth a single ampicillin- 
resistant colony was used to inoculate 5mL of LB containing 100 ug/mL ampicillin. After growth for 10-16 hours with 
shaking (225 rpm) at 37°C, this culture was used to inoculate 50 mL of LB containing 100 u.g/mL ampicillin in a 250 
mL shaker flask. After growth at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 600 nm was ap- 
proximately 0.5 OD/mL. IPTG was added to 1mM and the culture allowed to grow for an additional 5-10 hours. Cells 

to were harvested by centrifugation (5000 rpm, 10 minutes) and stored at -20°C. The 6 histidine tag-collagen fragment 
fusion was purified on nickel resin columns. Cell pellets were resuspended in 10 mL of 6M guanidine hydrochloride/ 
20mM sodium phosphate/500mM Nad (pH 7.8) and bound in two 5 mL batches to the nickel resin. Columns were 
washed two times with 4 mL of binding buffer (8M urea/20mM sodium phosphate/500mM NaCI (pH 7.8)), two times 
with wash buffer 1 (8M urea/20mM sodium phosphate/500mM NaCI (pH 6.0)), and two times with wash buffer 2 (8m 

15 urea/20mM sodium phosphate/500mM NaCI (pH 5.3). The 6 histidine tag-collagen fragment fusion was eluted from 
the column with 5mL of elution buffer (8M urea/20mM sodium phosphate/500mM NaCI (pH 4.0) in 1 mL fractions. 
Fractions were assessed for protein by gel electrophoresis and fusion-containing fractions were concentrated and 
stored at -20°C. The yield was typically 15-25 mg/L of culture. 

[0202] The collagen is cleaved from the 6 histidine tag with enterokinase. Fusion-containing fractions are dialyzed 
20 against cleavage buffer (50mM Tris HCI, pH 8.0/5mM Ca'Cfe). After addition of enterokinase at 1 ng enzyme for each 
100 ug fusion, the solution is incubated at 37°C for 4-10 hours. Progress of the cleavage is monitored by gel electro- 
phoresis. The cleaved 6 histidine tag may be separated from the collagen fragment by passage over a nickel resin 
column as outlined above. 

25 EXAMPLE 12 

Expression in E. coli of Fragments of Human Collagen Type I (c^) with Optimized E. coli Codon Usage 

[0203] A plasmid (pN1-3, Figure 47) encoding the gene for the amino terminal 1 20 amino acids of human collagen 
30 Type I (a-|) with optimizedE. coli codon usage placed behind the isopropyl-p-D-thiogalactopyranoside (IPTG)-inducible 
fac promotor and also encoding B-lactamase is transformed into Escherichia coli strain DH5a (sup E44 A/acU169 
(<|>80/acZAM15)/>sdR17recA1 endM oyrA96 t/j/-1 re/A1 ) by standard heat shock transformation. Transformation cul- 
tures are plated on Luria Broth (LB) containing 100 ng/mL ampicillin and after overnight growth a single ampicillin- 
resistant colony is used to inoculate 10 mL of LB containing 100 ug/mL ampicillin. After growth for 10-16 hours with 
35 shaking (225 rpm) at 37°C, this culture is used to inoculate 1 L of LB containing 100 ug/mL ampicillin in a 1 .5 L shaker 
flask. After growth at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 
OD/mL. IPTG is added to 1mM and the culture allowed to grow for an additional 5-10 hours. Cells are harvested by 
centrifugation (5000 rpm, 10 minutes) and lysed by mechanical disruption. Recombinant human collagen is purified 
by ammonium sulfate fractionation and column chromatography. The yield is typically 15-25 mg/L of culture. 

40 

EXAMPLE 13 

Expression in E. coli of a C-terminal Fragment of Human Collagen Type I (c^) with Optimized E. coli Codon Usage. 

45 [0204] A plasmid (pD4, Figure 48) encoding the gene for the carboxy terminal 219 amino acids of human collagen 
Type I (a,) with optimized E. cob codon usage placed behind the isopropyl-&-D-thiogalactopyranoside (IPTG)-inducible 
fac promotor and also encoding B-lactamase is transformed into Escherichia coli strain DH5a (sup E44 A/acU169 
(<|>80/acZ AM15) hsdR17 recA1 endA1 gyrA96 f/?/-1 re/A1) by standard heat shock transformation. Transformation cul- 
tures are plated on Luria Broth (LB) containing 100 u.g/mL ampicillin and after overnight growth a single ampicillin- 

50 resistant colony is used to inoculate 10 mL of LB containing 100 ug/mL ampicillin. After growth for 10-16 hours with 
shaking (225 rpm) at 37°C, this culture is used to inoculate 1 L of LB containing 1 00 ug/mL ampicillin in a 1 .5 L shaker 
flask. After growth at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 
OD/mL. IPTG is added to 1mM and the culture allowed to grow for an additional 5-10 hours. Cells are harvested by 
centrifugation (5000 rmp, 10 minutes) and lysed by mechanical disruption. Recombinant human collagen fragment is 

55 purified by ammonium sulfate fractionation and column chromatography. The yield is typically 1 5-25 mg/L of culture. 
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EXAMPLE 14 

Construction and Expression in E. coli of the Human Collagen Type 1 (a2) Gene with Optimized E. coli Codon Usage 

5 A) Construction of the gene: 

[0205] The nucleotide sequence of the helical region of human collagen Type I (02) gene flanked by 11 amino acids 
of the amino terminal extra-helical and 12 amino acids of the C-terminal extra-helical region is shown in Figures 49A- 
49E (SEQ. ID. NO. 29). A tabulation of the codon frequency of this gene is given in Table III below. The gene sequence 
10 shown in Figures 49A-49E was first changed to reflect E. coli codon bias. An initiating methionine was inserted at the 
5' end of the gene and a TAAT stop sequence at the 3' end. Unique restriction sites are identified or created approxi- 
mately every 150 base pairs. The resulting gene (HuCol(a 2 ) Ec , Figures 50A-50E) (SEQ. ID. NO. 31) has the codon 
usage given in Table IV below. Other sequences that approximate E. coli codon bias are also acceptable. 

Table ID 
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Table IV 
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[0206] Oligos of approximately 80 nucleotides are synthesized on a Beckman Oligo 1 000 DNA synthesizer, cleaved 
25 and deprotected with aqueous NH 4 OH, and purified by electrophoresis in 7M urea/1 2% polyacrylamide gels. Each set 
of oligos is designed to have an EcoR I restriction enzyme site at the 5' end, a unique restriction site near the 3" end; 
followed by the TAAT stop sequence and a Hind III restriction enzyme site at the very 3' end. Oligos N1-1(a2) and N1-2 
(02) are designed to insert an initiating methionine (ATG) codon at the 5' end of the gene. 

[0207] In one instance, oligos N1-1(ct2) and N1-2(o2) (1 ng each) (Figure 51 depicts sequence and restriction maps 

30 of synthetic oligos used to construct the first 240 base pairs of human Type l(ct 2 ) collagen gene with optimized E. coli 
codon usage) are annealed in 20 ]iL of T7 DNA polymerase buffer (40mM Tris HCI (pH 8.0), 5mM MgClj, 5mM dithi- 
othreitol, 50mM NaCI, 0.05 mg/mL bovine serum albumin) by heating at 90°C for 5 minutes followed by slow cooling 
to room temperature. After brief centrifugation at 14,000 rpm, 10 units of T7 DNA polymerase and 2 nL of a solution 
of all four dNTPs (dATP, dGTP, dCTP, dTTP, 2.5mM each) are added to the annealed oligos. Extension reactions are 

35 incubated at 37°C for 30 minutes and then heated at 70°C for 10 minutes. After cooling to room temperature, Hind III 
buffer (5 u.L of 10x concentration), 20 uL of H 2 0, and 10 units of Hind III restriction enzyme are added and the tubes 
incubated at 37°C for 10-16 hours. Hind III buffer (2 ul of 10x concentration), 13.5 ul of 0.5 Tris HCI (pH 7.5), 1.8 uL 
of 1% Triton X100, 5.6 jiL of H 2 0, and 20 U of EcoR I are added to each tube and incubation continued for 2 hours at 
37°C. Digests are extracted once with an equal volume of phenol, once with phenol/chloroform/isoamyl alcohol, and 

40 once with chloroform/isoamyl alcohol. After ethanol precipitation, the pellet is resuspended in 10u.L of TE buffer (10mM 
Tris HCI (pH 8.0), 1 mM EDTA). Resuspended pellet (4 u.L) is ligated overnight at 1 6°C with agarose gel-purified EcoRI/ 
Hind III digested pBSKS + vector (1 ug) using T4 DNA ligase (100 units). One half of the transformation mixture is 
transformed by heat shock into DH5a cells and 100 uL of the 1.0 mL transformation mixture is plated on Luria Broth 
(LB) agar plates containing 70 ug/mL ampicillin. Plates are incubated overnight at 37°C. Ampicillin resistant colonies 

45 (6-12) are picked and grown overnight in LB media containing 70ug/mL ampicillin. Plasmid DNA is isolated from each 
culture by Wizard Minipreps (Promega Corporation, Madison, Wl) and screened for the presence of the approximately 
1 20 base pair insert by digestion with EcoR I and Hind III and running the digestion products on agarose electrophoresis 
gels. Clones with inserts are confirmed by standard dideoxy termination DNA sequencing. The correct clone is named 
pBSN 1-1(02) Figure 52). 

50 [0208] Oligos N1-3(02> and N1-4(a2) are synthesized, purified, annealed, extended, and cloned into pBSKS* follow- 
ing the same procedure given above for oligos Nl-I^) and N1-2(o2). The resulting plasmid is named pBSN1-2A. To 
clone together the sections of the collagen gene from pBSNI-Haj) (1 ug) is.digested for 2 hours at 37"C with BsrF I 
and Hind III. The digested vector is purified by agarose gel electrophoresis. Plasmid pBSn1-2(ot2) (3 ug) is digested 
for 2 hours at 37°C with BsrF I and Hind III and the insert purified by agarose gel electrophoresis. BsrF I/Hind III- 

55 digested pBSN1-1 is ligated with this insert overnight at 16°C with T4 DNA ligase. One half of the ligation mixture is 
transformed into DH5ct cells and 1/10 of the transformation mixture is plated on LB agar plates containing 70 ug/mL 
ampicillin. After overnight incubation at 37°C, ampicillin-resistant clones are picked and screened for the presence of 
insert DNA as described above. Clones are confirmed by dideoxy termination sequencing. The correct clone is name 
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pBSN1-2(a 2 ) (Figure 53) and the collagen fragment has the sequence given in Figure 54 (SEQ. ID. NO. 37). 
[0209] In a similar manner, the remainder of the collagen gene is constructed such that the final DNA sequence is 
that given in Figures 50A-50E (SEQ. ID. NO. 31). 

5 B) Expression of the gene in E. coli: 

[0210] Following construction of the entire human collagen Type I (<x2) gene with codon usage optimized for E. coli, 
the cloned gene is expressed in E. coli. A plasmid (pHucoKo^ 0 , Figure 55) encoding the entire synthetic collagen 
gene (Figures 50A-50E) placed behind the isopropyl-fl-D-thiogalactopyranoside (IPTG)-mducible tac promoter and 

10 also encoding lactamase is transformed into Escherichia co/Zstrain DH5a (supE44 A/acU169 (4>80/acZAM1 5) />sdR17 
recA1 endA1 gyrA96 f/j/-1 re/A1 ) by standard heat shock transformation. Transformation cultures are plated on Luria 
Broth (LB) containing 100 ug/mL ampicillin and after overnight growth a single ampicillin-resistant colony is used to 
inoculate 1 0 mL of LB containing 1 00 ug/mL ampicillin and after overnight growth a single ampicillin-resistant colony 
is used to inoculate 1 0 mL of LB containing 1 00 ug/mL ampicillin. After growth for 10-16 hours with shaking (225 rpm) 

15 at 37°C, this culture is used to inoculate 1 L of LB containing 100 ug/mL ampicillin in a 1 .5 L shaker flask. After growth 
at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 OD/mL. IPTG is 
added to 1 mM and the culture allowed to grow for an additional 5-10 hours. Cells are harvested by centrifugation (5000 
rpm, 10 minutes) and lysed by mechanical disruption. Recombinant human collagen is purified by ammonium sulfate 
fractionation and column chromatography. The yield is typically 15-25 mg/L of culture. 

20 

EXAMPLE 14A 

Alternative Construction and Expression in E. Coli of the Human Collagen Type 1 (a2) Gene with Optimized E. coli 
Codon Usage 

25 

A) Construction of the gene: 

[0211] The nucleotide sequence of the helical region of human collagen Type 1 (ct2) gene flanked by 11 amino acids 
of the amino terminal extra-helical and 12 amino acids of the C-terminal extra-helical region is shown in Figures 49A- 

30 49E (SEQ. ID. NO. 29). A tabulation of the codon frequency of this gene is given in Table III. The gene sequence shown 
in Figures 49A-49E was first changed to reflect E. coli codon bias. An initiating methionine was inserted at the 5' end 
of the gene and a TAAT stop sequence at the 3' end. Unique restriction sites were identified or created at appropriate 
locations in the gene (approximately every 150 base pairs). The resulting gene (HuCol(a2) Ec , Figures 50A-50E) (SEQ. 
ID. NO. 31) has the codon usage given in Table IV. Other sequences that approximate E. coli codon bias are also 

35 acceptable. 

[021 2] Oligonucleotides were synthesized on a Beckman Oligo 1 000 DNA synthesizer, cleaved and deprotected with 
aqueous NH 4 OH, and purified by electrophoresis in 7M urea/1 2% polyacrylamide gels. Purified oligos (32.5 pmol) were 
dissolved in 20uL of ligation buffer (Boehringer Mannheim, Cat No. 1635 379) and annealed by heating to 95°C followed 
by slow cooling to 20°C over 45 minutes. The annealed oligonucleotides were ligated for 5 minutes at room temperature 

40 with digested vector (1ug) using T4 DNA ligase (5 units). One half of the transformation mixture was transformed by 
heat shock into DH5a cells and 100uL of the 1.0mL transformation mixture plated on Luria Broth (LB) agar plates 
containing 70ug/mL ampicillin. Plates were incubated overnight at 37°C. Ampicillin resistant colonies (6-1 2) were picked 
and grown overnight in LB media containing 70ug/mL ampicillin. Plasmid DNA was isolated from each culture by 
QIAprep Miniprep (Qiagen, Valencia, CA) and screened for the presence of insert by digestion with flanking restriction 

45 enzymes and running the digestion products on agarose electrophoresis gels. Clones with inserts were confirmed by 
standard dideoxy termination DNA sequencing. To clone together the sections of the collagen gene, and insert covering 
a flanking portion of the gene was ligated into vector containing the neighboring gene portion. Inserts were isolated 
from plasmids and vectors were cut by double digestion for 2 hours at 37°C with the appropriate restriction enzymes. 
The digested vector and insert were purified by agarose gel electrophoresis. Insert and vector were ligated for 5 minutes 

50 at room temperature following the procedure in the Rapid DNA Ligation Kit (Boehringer Mannheim). One half of the 
ligation mixture is transformed into DH5a cells and 1/10 of the transformation mixture was plated on LB agar plates 
containing 70ng/mL ampicillin. After overnight incubation at 37°C, ampicillin-resistant clones were picked and screened 
for the presence of insert DNA as described above. Clones were confirmed by dideoxy termination sequencing. 
[0213] In a similar manner, the remainder of the collagen gene was constructed such that the final DNA sequence 

55 is that given in Figures 50A-50E (SEQ. ID. NO. 31 ). 
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B) Expression of the gene in £. coli: 

[0214] Following construction of the entire human collagen Type 1(a2) gene with codon usage optimized for E. coli, 
the cloned gene is expressed in E. coli. A plasmid (pHuCol)(ct2) Ec , Figure 55) encoding the entire collagen gene (Fig- 
5 ures 50A-50E) placed behind the isopropyi-B-D-thiogalactopyranoside (IPTG)-inducible fac promoter and also encod- 
ing B-lactamase is transformed into Escherichia coli strain DH5a (supE44 A/acU169 (<j>80/acZ AM15) hsdRM recA1 
endA1 gyrA96 tf»-1 re/A1) by standard heat shock transformation. Transformation cultures are plated on Luria Broth 
(LB) containing 1 0Oug/mL ampicillin and after overnight growth a single ampicillin-resistant colony is used to inoculate 
10 mL of LB containing 100ug/mL ampicillin. After growth for 10-16 hours with shaking (225 rpm) at 37°C. this culture 
.10 is used to inoculate 1 L of LB containing 100ug/mL ampicillin in a 1.5 L shaker flask. After growth at 37°C, 225 rpm, 
for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 OD/mL. IPTG is added to 1mM and the 
culture allowed to grow for an additional 5-10 hours. Cells are harvested by centrifugation (5000 rpm, 1 0 minutes) and 
lysed by mechanical disruption. Recombinant human collagen is purified by ammonium sulfate fractionation and column 
chromatograph. The yield is typically 15-25 mg/L of culture. 

15 

EXAMPLE 15 

Expression in E. colioi Fragments of Human Collagen Type 1(02) with Optimized E. coli Codon Usage 

20 [0215] A plasmid (pN1-2, Figure 56) encoding the gene for the amino terminal 80 amino acids of human collagen 
Type 1(02) (SEQ. ID. NO. 31 , Fig. 54) with optimized E. coli codon usage placed behind the isopropyl-B-D-thiogalact- 
opyranoside (IPTG)-mducible fac promotor and also encoding (B-lactamase is transformed into Escherichia coli strain 
DH5a (supE44 A/acU169 (<|)80/acZ AM15) nsc/R17 recA1 endA1 gyrA96 fAiM re/A1) by standard heat shock transfor- 
mation. Transformation cultures are plated on Luria Broth (LB) containing 100 ug/mL ampicillin and after overnight 

25 growth a single ampicillin-resistant colony is used to inoculate 10 mL of LB containing 100 ug/mL ampicillin. After 
growth for 10-16 hours with shaking (225 rpm) at 37°C, this culture is used to inoculate 1 L of LB containing 100 \igl 
mL ampicillin in a 1 .5 L shaker flask. After growth at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 
600 nm is approximately 0.5 OD/mL. IPTG is added to 1mM and the culture allowed to grow for an additional 5-10 
hours. Cells are harvested by centrifugation (5000 rpm, 10 minutes) and lysed by mechanical disruption. Recombinant 

30 human collagen is purified by ammonium sulfate fractionation and column chromatography. The yield is typically 1 5-25 
mg/L of culture. 

EXAMPLE 16 

35 Hydroxyproline Incorporation Into Proteins In E. coli Under Proline Starvation Conditions 

[0216] Seven plasmids, pGEX-4T.1 (Fig. 73), pTrc-TGF (Fig. 74), pMal-C2 (Fig. 1), ptrc-FN (Fig. 75), pTrc-FN-TGF 
(Fig. 76), pTrc-FN-Bmp (Fig. 77) and pGEX-HuCoII Ec , each separately containing genes encoding the following pro- 
teins: glutathione S-transferase (GST), the mature human TGF-81 polypeptide (TGF-81), mannose-binding protein 

40 (MBP), a 70 kDA fragment of human fibronectin (FN), a fusion of FN and TGF-B1 (FN-TGF-B1 ), a fusion of FN and 
human bone morphogenic protein 2A (FN-BMP-2A), and a fusion of GST and collagen (GST-Coll), were used individ- 
ually to transform proline auxotrophic E. coli strain JM1 09 (F-). transformation cultures were plated on LB agar con- 
taining 1 00 ug/ml ampicillin. After overnight incubation at 37°C, a single colony from a fresh transformation plate was 
used to inoculate 5 ml of LB media containing 400 mg ampicillin. After overnight growth at 37°C, this culture was 

45 centrifuged, the supernatant discarded, and the cell pellet washed twice with 5 ml of M9 medium (1X M9 salts, 0.5% 
glucose, 1 mM MgC^, 0.01 % thiamine, 200 ug/ml glycine, 200 u.g/ml alanine, 1 00 jig/ml of the other amino acids except 
proline, and 400 ug/ml ampicillin). The cells were finally resuspended in 5 ml of M9 medium. After incubation with 
shaking at 37°C for 30 minutes, frans-4-hydroxyproline was added to 40mM, NaCI to 0.5 M, and isopropyl-B-D-thioga- 
lactopyyranoside to 1 .5 mM. In certain cultures one of these additions was not made, as indicated in the labels for the 

50 lanes of the gels. After addition, incubation with shaking at 37°C was continued. After 4 hours, the cultures were 
centrifuged, the supematants discarded, and the cell pellets resuspended in SDS-PAGE sample buffer (300 mM Tris 
(pH6.8)/0.5% SDS/10% glycerol/0.4M B-mercapthoethanol/0.2% bromophenol blue) to 15 OD600nm AU/ml, placed 
in boiling water bath for five minutes, and electrophoresed in denaturing polyacrylaminde gels. Proteins in the gels 
were visualized by staining with Coomassie Blue R250. The results of the gels are depicted in scans shown in Figs. 

55 57-59. The scans relating to GST, TGF-81 , MBP, FN, FN-TGF-B1 , and FN-BMP-2A (Figs. 57 and 58) show three lanes 
relating to each peptide, i.e., one lane indicating +NaCI/+Hyp wherein NaCI (hyperosmotic) and (rans-4-hydroxyproline 
are present; one lane indicating -NaCI wherein frans-4-hydroxyproline is present but NaCI is not; and one lane indicating 
-Hyp which is +NaCI but absent frans-4-hydroxyproline. Asterisks on the scans mark protein bands which correspond 
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to the expressed target protein. The instances in which target protein was expressed all involve +NaCI in connection 
with +Hyp thus demonstrating +NaCI and +Hyp dependence. 

[0217] The scan shown in Fig. 59 relating to GST-collagen shows four lanes relating to GST-Coll, i.e., one lane 
indicating +Hyp/+NaCI/-)PTG wherein frans-4-hydroxyproline and NaCI are present but IPTG (the protein expression 

5 inducer) is not and since there is no inducer, there is no target protein band; one lane indicating +NaCI/+IPTG/-Hyp 
wherein NaCI and IPTG are present but f/ans-4-hydroxyproline is not and, since frans-4-hydroxyprolihe is not present 
no target protein band is evident; one lane indicating +NaCI/+Pro/+IPTG wherein NaCI, proline and IPTG are present, 
but since the target protein is not stable when it contains proline, there is no target protein band; and one lane designated 
+IPTG/+NaCI/+Hyp wherein IPTG, NaCI and frans-4-hydroxyproline are present and since the protein is stabilized by 

10 the presence of frans-4-hydroxyproline an asterisk marked protein band is evident 

EXAMPLE 17 

Hydroxyproline incorporation into a collagen-like peptide in E. coli. 

15 

[0218] A plasmid (pGST-CM4, Figure 60) containing the gene for collagen mimetic 4 (CM4, Figure 61) (SEQ. ID. 
NO. 39) genetically linked to the 3' end of the gene for S.japonicum glutathione S-transferase was used to transform 
by electroporation proline auxotrophic £. coli strain JM1 09 (F-). Transformation cultures were plated on LB agar con- 
taining 1 00 pg/ml ampicillin. After overnight incubation at 37° C, a single colony from a fresh transformation plate was 

20 used to inoculate 5 ml of LB media containing 100 ug/ml ampicillin. After overnight growth at 37° C, 500 ul of this 
culture was centrifuged, the supematent discarded, and the cell pellet washed once with 500 ul of M9 medium (1X M9 
salts, 0.5 % glucose, 1 mM MgC^, 0.01 % thiamine, 200 ug/ml glycine, 200ug/ml alanine. 1 00 ug/ml of the other amino 
acids except proline, and 400 ug/ml ampicillin). The cells were finally suspended in 5 ml of M9 medium containing 10 
ug/ml proline and 2 ml of this was used to inoculate 30 ml of M9 medium containing 1 0 ug/ml proline. After incubation 

25 with shaking at 37° C for8 hours, the culture was centrifuged and the cell pellet washed once with M9 medium containing 
5 ug/ml proline. The pellet was resuspended in 15 ml of M9 medium containing 5 ug/ml of proline and this culture was 
used to inoculate 1 L of M9 medium containing 5 ug/ml of proline. This culture was grown for 18 hours at 37° C to 
proline starvation. At this time, the culture was centrifuged, the cells washed once with M9 medium (with no proline), 
and the cells resuspended in 1 L of M9 medium containing 80 mM hydroxyproline, 0.5 M NaCI, and 1.5 mM isopropyl- 

30 B-D-thiogalactopyranoside. Incubation was continued at 37° C with shaking for 22 hours. The cultures were centrifuged 
and the cell pellets stored at -20°C until processed further. 

EXAMPLE 18 

35 Proline incorporation into a collagen-like peptide in E. coli. 

[0219] A plasmid (pGST-CM4, Figure 60) containing the gene for collagen mimetic 4 (CM4, Figure 61) (SEQ. ID. 
NO. 39) genetically linked to the 3' end of the gene for S. japonicum glutathione S-transferase was used to transform 
by electroporation proline auxotrophic E. coli strain JM109 (F-). Transformation cultures were plated on LB agar con- 

io taining 100 ug/ml ampicillin. After overnight incubation at 37° C, a single colony from a fresh transformation plate was 
used to inoculate 5 ml of LB media containing 100 ug/ml ampicillin. After overnight growth at 37° C, 500 ul of this 
culture was centrifuged, the supematent discarded, and the cell pellet washed once with 500 ul of M9 medium (1 X M9 
salts, 0.5 % glucose, 1 mM MgClj, 0.01 % thiamine, 200 ug/ml glycine, 200 ug/ml alanine, 1 00 ug/ml of the other amino 
acids except proline, and 400 ug/mL ampicillin). The cells were finally resuspended in 5 ml of M9 medium containing 

45 10 ug/ml proline and 2 ml of this was used to inoculate 30 ml of M9 medium containing 1 0 ug/ml proline. This culture 
was incubated with shaking at 37° C for 8 hours. The culture was centrifuged and the cell pellet washed once with M9 
medium containing 5 ug/ml proline. The pellet was resuspended in 15 ml of M9 medium containing 5 ug/ml of proline 
and this culture was used to inoculate 1 L of M9 medium containing 5 ug/ml of proline. This culture was grown for 18 
hours at 37°C to proline starvation. At this time, the culture was centrifuged, the cells washed once with M9 medium 

50 (with no proline), and finally the cells were resuspended in 1 L of M9 medium containing 2.5 mM proline, 0.5 M NaCI, 
and 1 .5 mM isopropyl-p-8-thiogalactopyranoside. Incubation was continued at 37° C with shaking for 22 hours. The 
cultures were then centrifuged and the cell pellets stored at -20°C until processed further. 

EXAMPLE 19 

55 

Purification of hydroxyproline-containing collagen-like peptide from E. coli 

[0220] The cell pellet from a 1 L fermentation culture prepared as described in Example 1 7 above, was resuspended 
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in 20 ml of Dulbecco's phosphate buffered saline (pH 7.1) (PBS) containing 1 mM EDTA, 100 uM PMSF, 0.5 ug/ml 
E64, and 0.7 ug/ml pepstatin (resuspension buffer). The cells were lysed by twice passing through a French press. 
Following lysis, the suspension was centrifuged for 30 minutes at 30,000 xg. The supematent was discarded and the 
pellet washed once with 5 ml of resuspension buffer containing 1 M urea and 0.5% Triton X100 followed by one wash 

5 with 7 ml of resuspension buffer without urea or Triton X100. The pellet was finally resuspended in 5 ml of 6M guanidine 
hydrochloride in Dulbecco's phosphate buffered saline (pH7.1) containing 1 mM EDTA and 2 mM B-mercaptoethanol 
and sonicated on ice for 3 x 60 seconds (microtjp, power = 3.5, Heat Systems XL-2020 model sonicator). The sonicated 
suspension was incubated at 4° C for 1 8 hours and then centrifuged at 14,000 rpm in a microcentrifuge. The supematent 
(6 ml) was diatyzed (10,000 MWCO) against 4 x 4 L of distilled water at 4°C. The contents of the dialysis tubing were 

10 transferred to a 150 ml round bottom flask and lyophilized to dryness. The residue (~30 mg) was dissolved in 3 ml of 
70% formic acid and 40 mg of cyanogen bromide was added. The flask was flushed once with nitrogen, evacuated, 
and allowed to stir for 1 8 hours at room temperature. The contents of the flask were taken to dryness in vacuo at room 
temperature, the residue resuspended in 5 ml of distilled water and evaporated to dryness again. This was repeated 
2 times. The residue was finally dissolved in 2 ml of 0.2% trifluoroacetic acid (TFA). The trifluoroacetic acid-soluble 

is material was applied in 100 ui aliquots to a Poros R2 column (4.6 mm x 100 mm) running at 5 ml/min. with a starting 
buffer of 98% 0.1% trifluoroacetic acid in water/2% 0.1 % TFA in acetonitrile. The hydroxyproline-containing protein 
was eluted with of gradient of 2% 0.1% TFA/acetonitrile to 40% 0.1% TFA/acetonitrile over 25 column volumes (Fig. 
62A). The collagen-mimetic eluted between 18 and 23% 0.1% TFA/acetonitrile. Figure 62A is a chromatogram of the 
elution of hydroxyproline containing CM4 from a Poros RP2 column (available from Perseptive Biosystems, Framing- 

20 ham, MA). The arrow indicates the peak containing hydroxyproline containing CM4. Fractions were assayed by SDS- 
PAGE and collagen mimetic-containing fractions were pooled and lyophilized. Lyophilized material was stored at -20° 
C. 

EXAMPLE 20 

25 

Purification of proline-containing collagen-like peptide from £. coli 

[0221] The cell pellet from a 500 ml fermentation culture prepared as described in Example 18 above, was resus- 
pended in 20 ml of Dulbecco's phosphate buffered saline (pH 7.1 ) (PBS) containing 10 mM EDTA, 100 uM PMSF, 0.5 

30 ug/ml E64, and 0.06 ug/ml aprotinin. Lysozyme (2 mg) was added and the suspension incubated at 4° C for 60 minutes. 
The suspension was sonicated for 5 x 60 seconds (microtip, power = 3.5, Heat Systems XL-2020 model sonicator). 
The sonicated suspension was centrifuged at 20,000 xg for 15 minutes. The supematent was adjusted to 1% Triton 
X100 and incubated for 30 minutes at room temperature with 7 ml of glutathione sepharose 4B pre-equilibrated in PBS. 
The suspension was centrifuged at 500 rpm for 3 minutes. The supematent decanted, and the resin washed 3 times 

35 with 8 ml of PBS. Bound proteins were eluted with 3 aliquots (2 ml each, 1 0 minutes gentle rocking at room temperature) 
of 10 mM glutathione in 50 mM Tris (pH 8.0). Eluants were combined and dialyzed (10,000 MWCO) against 3 x 4 L of 
distilled water at 4° C. The contents of the dialysis tubing were transferred to a 1 50 ml round bottom flask and lyophilized 
to dryness. The residue was dissolved in 3 ml of 70% formic acid and 4 mg of cyanogen bromide was added. The flask 
was flushed once with nitrogen evacuated, and allowed to stir for 18 hours at room temperature. The contents of the 

40 flask were taken to dryness in vacuo at room temperature, the residue resuspended in 5 ml of distilled water, and 
evaporated to dryness again. This was repeated 2 times. The residue was finally dissolved in 2 ml of 0.2% trifluoroacetic 
acid (TFA). The trifluoroacetic acid-soluble material was applied in 100 uJ aliquots to a Poros R2 column (4.6 mm x 
1 00 mm) running at 5 ml/min. with a starting buffer of 98% 6.1 % trifluoroacetic add in water/2% 0.1 % TFA in acetonitrile. 
Bound protein was eluted with of gradient of 2% 0.1 % TFA/acetonitrile to 40% 0.1 % TFA/acetonitrile over 25 column 

45 volumes (Figure 62B). The collagen-mimetic eluted between 24 and 27% 0.1 % TFA/acetonitrile. Figure 62B is a chro- 
matogram of the elution of proline containing CM4 from a Poros RP2 column. The arrow indicates the peak containing 
proline containing CM4. Fractions were assayed by SDS-PAGE and collagen mimetic-containing fractions were pooled 
and lyophilized. Lyophilized material was stored at -20° C. 

50 EXAMPLE 21 

Amino acid analysis of hydroxyproline-containing collagen mimetic and proline-containing collagen mimetic. 

[0222] Approximately 30 ug of purified hydroxyproline-containing collagen mimetic and proline-containing collagen 
55 mimetic prepared as described in Examples 19 and 20, respectively, were dissolved in 250 uJ of 6N hydrochloric acid 
in glass ampules. The ampules were flushed two times with nitrogen, sealed under vacuum, and incubated at 110°C 
for 23 hours. Following hydrolysis, samples were removed from the ampules and taken to dryness in vacuo. The 
samples were dissolved in 1 5 uJ of 0.1 N hydrochloric acid and subjected to amino acid analysis on a Hewlett Packard 
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AminoQuant 1090 amino acid analyzer utilizing standard OPA and FMOC derivitization chemistry. Examples of the 
results of the amino acid analysis that illustrate the region of the chromatograms where the secondary amino acids 
(proline and hydroxyproline) elute are shown in Figures 63A through 63D. These Figures also show chromatograms 
of proline and hydroxyproline amino acid standards. More particularly, Figure 63A, depicts a chromatogram of a proline 

5 amino acid standard (250 pmol). *indicates a contaminating peak; Figure 63B depicts a chromatogram of a hydroxy- 
proline amino acid standard (250 pool), 'indicates a contaminating peak. Figure 63C depicts an amino analysis chro- 
matogram of the hydrolysis of proline-containing CM4. Only the region of the chromatogram where proline and hydrox- 
yproline elute is shown, "indicates a contaminating peak. Figure 63D depicts an amino acid analysis chromatogram 
of the hydrolysis of hydroxyproline-containing CM4. Only the region of the chromatogram where proline and hydroxy- 

10 proline elute is shown, 'indicates a contaminating peak. 

EXAMPLE 22 

Determination of proline starvation conditions for E. coli (strain JM1 09 (F-)) 

[0223] A plasmid (pGST-CM4, Figure 60) containing the gene for collagen mimetic 4 (CM4, Figure 61) genetically 
linked to the 3' end of the gene for S. japonicum glutathione S-transferase was used to transform by electroporation 
proline auxotrophic E. coli strain JM109 (F-). Transformation cultures were plated on LB agar containing 100 ug/ml 
ampicillin. After overnight incubation at 37 °C, a single colony from a fresh transformation plate was used to inoculate 

20 2 ml of M9 media (1X M9 salts, 0.5 % glucose, 1 mM MgCfe, 0.01 % thiamine, 200 ug/ml glycine, 200 ug/ml alanine, 
100 ug/ml of the other amino acids except proline, and 200 ug/ml carbenicillin) and containing 20 ug/ml proline. After 
growth at 37° C with shaking for 8 hours, 1.5 ml was used to inoculate 27 ml of M9 media containing 45 ug/ml proline. 
After incubation at 37° C with shaking for 7 hours, the culture was centrifuged, the cell pellet washed with 7 ml of M9 
media with no proline, and finally resuspended in 1 7 ml of M9 media with no proline. This culture was used to inoculate 

25 four 35 ml cultures of M9 media containing 4 ug/ml proline at an OD600 of 0.028. Cultures were incubated with shaking 
• at 37° C and the OD600 monitored. After 1 3.5 hours growth, the OD600 had plateaued. At this time, one culture was 
supplemented with proline at 15 ug/ml, one with hydroxyproline at 15 ug/ml, one with all of the amino acids at 15 ug/ 
ml except proline and hydroxyproline, and one culture with nothing. Incubation was continued and the OD600 monitored 
for a total of 24 hours. Figure 64 is a graph of OD600 vs. time for cultures of JM109 (F-) grown to plateau and then 

30 supplemented with various amino acids. The point at which the cultures were supplemented is indicated with an arrow. 
Proline starvation is evident since only the culture supplemented with proline continued to grow past plateau. 

EXAMPLE 23 

35 Hydroxyproline Incorporation Into Type I (a1) Collagen in E. coli 

[0224] A plasmid (pHuCol(a1 ) Ec , Figure 65) containing the gene for Type I (crt ) collagen with optimized E. coli codon 
usage (Figure 39A-39E) (SEQ. ID. NO. 19) under control of the tac promoter and containing the gene for chloram- 
phenicol resistance was used to transform by electroporation proline auxotrophic E. coli strain JM109 (F-). Transfor- 

40 mation cultures were plated on LB agar containing 20 ug/ml chloramphenicol. After overnight incubation at 37 °C, a 
single colony from a fresh transformation plate was used to inoculate 100 ml of LB media containing 20 ug/ml chlo- 
ramphenicol. This culture was grown to an OD600nm of 0.5 and 1 00 ul aliquots transferred to 1 .5 ml tubes. The tubes 
were stored at -80 ° C. For expression, a tube was thawed on ice and used to inoculate 25 ml of LB media containing 
20 ug/ml chloramphenicol. After overnight growth at 37° C, a four ml aliquot was withdrawn, centrifuged, the cell pellet 

« washed once with 1 ml of 2x YT media containing 20 ug/ml chloramphenicol, and the washed cells used to inoculate 
1 L of 2x YT medium containing 20 ug/ml chloramphenicol. This culture was grown at 37° C to an OD600nm of 0.8. 
The culture was centrifuged and the cell pellet washed once with 100 ml of M9 medium (1X M9 salts, 0.5 % glucose, 
1 mM MgCI 2 , 0.01 % thiamine, 200 ug/ml glycine, 200 ug/ml alanine, 100 ug/ml of the other amino acids except proline, 
and 20 ug/ml chloramphenicol). The cells were resuspended in 910 ml of M9 medium (1X M9 salts, 0.5 % glucose, 1 

so mM MgCI 2 , 0.01 % thiamine, 200 ug/ml glycine, 200 ug/ml alanine, 1 00 ug/ml of the other amino acids except proline, 
and 20 ug/ml chloramphenicol) and allowed to grow at 37° C for 30 minutes. NaCI (80 ml of 5 M), hydroxyproline (7.5 
ml of 2M), and IPTG (500 ul of 1 M) were added and growth continued for 3 hours. Cells were harvested by centrifugation 
and stored at -20° C. 
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EXAMPLE 24 

Hydroxyproline Incorporation Into Type I (a2) in E. coli 

s [0225] A plasmid (pHuCoKoS)^, Figure 66) containing the gene for Type I (a2) collagen with optimized E. coli codon 
usage (Figure 50A-50E) (SEQ. ID. NO. 31) under control of the fac promoter and containing the gene for chloram- 
phenicol resistance was used to transform by electroporation proline auxotrophic E. coli strain JM109 (F-). Transfor- 
mation cultures were plated on LB agar containing 20 ug/ml chloramphenicol. After overnight incubation at 37° C, a 
single colony from a fresh transformation plate was used to inoculate 100 ml of LB media containing 20 ug/ml chlo- 

10 ramphenicol. This culture was grown to an OD600nm of 0.5 and 1 00 ul aliquots transferred to 1 .5 ml tubes. The tubes 
were stored at -80 ° C. For expression, a tube was thawed on ice and used to inoculate 25 ml of LB media containing 
20 ug/ml chloramphenicol. After overnight growth at 37° C, a four ml aliquot was withdrawn, centrifuged, the cell pellet 
washed once with 1 ml of 2x YT media containing 20 ug/ml chloramphenicol, and the washed cells used to inoculate 
1 L of 2x YT medium containing 20 ug/ml chloramphenicol. This culture was grown at 37° C to an OD600nm of 0.8. 

is The culture was centrifuged and the cell pellet washed once with 100 ml of M9 medium (1X M9 salts, 0.5 % glucose, 
1 mM MgC^, 0.01 % thiamine, 200 ug/ml glycine, 200 ug/ml alanine, 100 ug/ml of the other amino acids except proline, 
and 20 ug/ml chloramphenicol). The cells were resuspended in 910 ml of M9 medium (1X M9 salts, 0.5 % glucose, 1 
mM MgCI 2 , 0.01 % thiamine, 200 ug/ml glycine, 200 pg/ml alanine, .100 ug/ml of the other amino acids except proline, 
and 20 ug/ml chloramphenicol) and allowed to grow at 37° C for 30 minutes. NaCI (80 ml of 5 M), hydroxyproline (7.5 

20 ml of 2M), and IPTG (500 ul of 1 M) were added and growth continued for 3 hours. Cells were harvested by centrifugation 
and stored at -20° C. 

EXAMPLE 25 

25 Hydroxyproline Incorporation Into a C-terminal Fragment of Type I (<x1) Collagen in E. coli 

t 

[0226] A plasmid (pD4-a1 , Figure 67) encoding the gene for the carboxy terminal 219 amino acids of human Type 
I (a1 ) collagen with optimized £. coli codon usage fused to the 3'-end of the gene for glutathione S-transferase and 
under control of the fac promoter and containing the gene for ampicillin resistance was used to transform by electrc- 

30 poration proline auxotrophic £. coli strain JM109 (Ft). Transformation cultures were plated on LB agar containing 100 
ug/ml ampicillin. After overnight incubation at 37° C, a single colony from a fresh transformation plate was used to 
inoculate 100 ml of LB media containing 100 ug/ml ampicillin. This culture was grown to an OD600nm of 0.5 and 100 
ul aliquots transferred to 1 .5 ml tubes. The tubes were stored at -80° C. For expression, a tube was thawed on ice and 
used to inoculate 25 ml of LB media containing 400 ug/ml ampicillin. After overnight growth at 37° C, a four ml aliquot 

35 was withdrawn, centrifuged, the cell pellet washed once with 1 ml of 2x YT media containing 400ug/ml ampicillin, and 
the washed cells used to inoculate 1 L of 2x YT medium containing 400 ug/ml ampicillin. This culture was grown at 
37° C to an OD600nm of 0.8. The culture was centrifuged and the cell pellet washed once with 100 ml of M9 medium 
(1X M9 salts, 0.5 % glucose, 1 mM MgC^, 0.01 % thiamine, 200 ug/ml glycine, 200 ug/ml alanine, 100 ug/ml of the 
other amino acids except proline, and 400 ug/ml ampicillin). The cells were resuspended in 910 ml of M9 medium (1X 

40 M9 salts, 0.5 % glucose, 1 mM MgC^, 0.01 % thiamine, 200 ug/ml glycine, 200 ug/ml alanine, 100 ug/ml of the other 
amino acids except proline, and 400 ug/ml ampicillin) and allowed to grow at 37° C for 30 minutes. NaCI (80 ml of 5 
M), hydroxyproline (7.5 ml of 2M), and IPTG (500 pi of 1 M) were added and growth continued for 3 hours. Cells were 
harvested by centrifugation and stored at -20° C. 

45 EXAMPLE 26 

Hydroxyproline Incorporation Into a C-terminal Fragment of Type I (ct2) Collagen in E. coli 

[0227] A plasmid (pD4-a2, Figure 68) encoding the gene for the carboxy terminal 219 amino acids of human Type 
I (a2) collagen with optimized E. coli codon usage as constructed in accordance with Example 14A fused to the 3'-end 
of the gene for glutathione S-transferase and under control of the fac promoter and containing the gene for ampicillin 
resistance was used to transform by electroporation proline auxotrophic E. coli strain JM109 (F-). Transformation cul- 
tures were plated on LB agar containing 1 00 ug/ml ampicillin. After overnight incubation at 37° C, a single colony from 
a fresh transformation plate was used to inoculate 100 ml of LB. media containing 100 ug/ml ampicillin. This culture 
was grown to an OD600nm of 0.5 and 100 uJ aliquots transferred to 1.5 ml tubes. The tubes were stored at -80° C. 
For expression, a tube was thawed on ice and used to inoculate 25 ml of LB media containing 400 ug/ml ampicillin. 
After overnight growth at 37° C, a four ml aliquot was withdrawn, centrifuged, the cell pellet washed once with 1 ml of 
2x YT media containing 400 ug/ml ampicillin, and the washed cells used to inoculate i L.of 2x YT medium containing 
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400 ug/ml ampicillin. This culture was grown at 37° C to an OD600nm of 0.8. The culture was centrifuged and the cell 
pellet washed once with 100 ml of M9 medium (1X M9 salts, 0.5 % glucose, 1 mM MgCfe, 0.01 % thiamine, 200 fig/ 
ml glycine, 200 ug/ml alanine, 1 00 ug/ml of the other amino acids except proline, and 400 ug/ml ampicillin). The cells 
were resuspended in 910 ml of M9 medium (1X M9 salts, 0.5 % glucose, 1 mM MgCfe, 0.01 % thiamine, 200 ug/ml 
5 glycine, 200 ug/ml alanine, 1 00 ug/ml of the other amino acids except proline, and 400 ug/ml ampicillin) and allowed 
to grow at 37° C for 30 minutes. NaCI (80 ml of 5 M). hydroxyproline (7.5 ml of 2M), and IPTG (500 ul of 1 M) were 
added and growth continued for 3 hours. Cells were harvested by centrifugation and stored at -20° C. 

EXAMPLE 27 

10 

Purification of Hydroxyproline-containing C-terminal Fragment of Type I (o1) Collagen 

[0228] Cell paste harvested from a 1 L culture grown as in Example 25 was resuspended in 30 ml of lysis buffer (2M 
urea, 137mM NaCI, 2.7mM KCI, 4.3mM NajHPCv 1 .4mM KH 2 P0 4 , 10mM EDTA, 10mM BME, 0.1% Triton X-100, pH 

is 7.4) at 4°C. Lysozyme (chicken egg white) was added to 100 ug/ml and the solution incubated at 4 °C for 30 minutes. 
The solution was passed twice through a cell disruption press (SLM Instruments, Rochester, NY) and then centrifuged 
at 30,000 x g for 30 minutes. The pellet was resuspended in 30 ml of 50 mM Tris-HCI, pH 7.6, centrifuged at 30,000 
x g for 30 minutes, and the pellet solubilized in 25 ml of solubilization buffer (8M urea, 1 37mM NaCI, 2.7mM KCI, 4.3mM 
Na 2 HP0 4 , 1.4'mM KH 2 P0 4 , 5mM EDTA, 5mM BME). The solution was centrifuged at 30,000xg for 30 minutes and 

20 supernatent dialyzed against two changes of 4 L of distilled water at 4°C. Following dialysis, the entire mixture was 
lyophilized. The lyophilized solid was dissolved in 0.1 M HCI in a flask with stirring. After addition of a 5-fold excess of 
crystalline BrCN, the flask was evacuated and filled with nitrogen. Cleavage was allowed to proceed for 24 hrs, at 
which time the solvent was removed in vacuo. The residue was dissolved in 0.1 % trifluoroacetic acid (TFA) and purified 
by reverse-phase HPLC using a Vydac C4 RP-HPLC column (10x250mm, 5u, 300 A) on a BioCad Sprint system 

25 . (Perceptive Biosystems, Framingham, MA). Hydroxyproline-containing D4 protein was eluted with a gradient of 1 5-40% 
acetonitrile/0.1% TFA over a 45 minute period. Protein D4-a1 eluted at 26% acetonitrile/0.1% TFA. 

EXAMPLE 28 

30 Purification of Hydroxyproline-containing C-terminal Fragment of Type I (a2) Collagen 

[0229] Cell paste harvested from a 1 L culture grown as in Example 26 was resuspended in 30 ml of lysis buffer (2M 
urea, 137mM NaCI, 2.7mM KCI, 4.3mM NajHPO,,, 1.4mM KH 2 P0 4 , 10mM EDTA. 10mM BME, 0.1% Triton X-100, pH 
7.4) at 4°C. Lysozyme (chicken egg white) was added to 100 ug/ml and the solution incubated at 4°C for 30 minutes. 

35 The solution was passed twice through a cell disruption press (SLM Instruments, Rochester, NY) and then centrifuged 
. at 30,000 x g for 30 minutes. The pellet was resuspended in 30 ml of 50 mM Tris-HCI, pH 7.6, centrifuged at 30,000 
x g for 30 minutes, and the pellet solubilized in 25 ml of solubilization buffer (8M urea, 1 37mM NaCI, 2.7mM KCI, 4.3mM 
Na 2 HP0 4 , 1.4mM KH 2 P0 4 , 5mM EDTA, 5mM BME). The solution was centrifuged at 30,000xg for 30 minutes and 
supernatent dialyzed against two changes of 4 L of distilled water at 4°C. Following dialysis, the entire mixture was 

40 lyophilized. The lyophilized solid was dissolved in 0.1 M HCI in a flask with stirring. After addition of a 5-fold excess of 
crystalline BrCN, the flask was evacuated and filled with nitrogen. Cleavage was allowed to proceed for 24 hrs, at 
which time the solvent was removed in vacuo. The residue was dissolved in 0. 1 % trifluoroacetic acid (TFA) and purified 
by reverse-phase HPLC using a Vydac C4 RP-HPLC column (10x250mm, 5u ; 300 A) on a BioCad Sprint system 
(Perceptive Biosystems, Framingham, MA). Hydroxyproline-containing D4 protein was eluted with a gradient of 1 5-40% 

« acetonitrile/0.1 % TFA over a 45 minute period. Protein D4-<x2 eluted at 25% acetonitrile/0.1 % TFA. 

EXAMPLE 29 

Amino Acid Composition Analysis of Hydroxyproline-containing C-terminal Fragment of Type I (a1) Collagen 

50 

[0230] Protein D4-«1 (10fig) purified as in Example 27 was taken to dryness in vacuo in a 1.5 ml microcentrifuge 
tube. A sample was subjected to amino acid analysis at the W.M. Keck Foundation Biotechnology Resource Laboratory 
(New Haven, CT) on an Applied Biosystems sequencer equipped with an on-Jine HPLC system. The experimentally 
determined sequence of the first 1 3 amino acids (SEQ. ID. NO. 4 1 ) and the sequence predicted from the DNA sequence 
55 (SEQ. ID. NO. 42) are shown in Figure 69. A sample of protein D4-al was subjected to mass spectral analysis on a 
VG Biotech BIO-Q quadrople analyzer at M-Scan, Inc. (West Chester, PA). The mass spectrum and the predicted 
molecular weight of protein D4-ct1 if it contained 100% hydroxyproline in lieu of proline are given in Figure 70. The 
predicted molecular weight of protein D4-«1 containing 100% hydroxyproline in lieu of proline is 20807.8 Da. The 
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experimentally determined molecular weight was 20807.5 Da. 
EXAMPLE 30 

5 Construction of Carboxy Terminal 21 9 Amino Acids of Human Collagen Type I (a1 ) Fragment Gene with Optimized E. 
Coli Codon Usage. 

[0231] The nucleotide sequence of the 657 nucleotide gene for the carboxy terminal 219 amino acids of human Type 
I (<x1 ) collagen with optimized E. Coli codon usage is shown in Figure 71 . For synthesis of this gene, unique restriction 

10 sites were identified or created approximately every 150 base pairs. Oligos of approximately 80 nucleotides were 
synthesized on a Beckman Oligo 1000 DNA synthesizer, cleaved and deprotected with aqueous NH 4 OH, and purified 
by electrophoresis in 7M urea/1 2% polyacrylamide gels. Each set of oligos was designed to have an EcoR I restriction 
enzyme site at the 5" end, a unique restriction site near the 3" end, followed by the TAAT stop sequence and a Hind III 
restriction enzyme site at the very 3" end. The first four oligos, comprising the first 84 amino acids of the carboxy terminal 

15 219 amino acids of human Type I (a1) collagen with optimized £. coli codon usage, are given in Figure 81 (SEQ. ID. 
NOS. 47-50). 

[0232] Oligos N4-1 tSEQ. ID. NO. 47) and N4-2 (SEQ. ID. NO. 48) (1 ug each) were annealed in 20 uL of T7 DNA 
polymerase buffer (40mM Tris-HCI (pH 8.0), 5mM MgCfe, 5mM dithiothreitol, 50mM NaCI, 0.05 mg/mL bovine serum 
albumin) by heating at 90°C for 5 minutes followed by slow cooling to room temperature. After brief centrifugation at 

20 14,000 rpm, 10 units of T7 DNA polymase and 2 uL of a solution of all four dNTPs (dATP, dGTP, dCTP, dTTP. 2.5mM 
each) were added to the annealed oligos. Extension reactions were incubated at 37°C for 30 minutes and then heated 
at 70°C for 10 minutes. After cooling to room temperature, Hind III buffer (5 uL of 10 x concentration), 20 u.L of H 2 0, 
and 10 units of Hind III restriction enzyme were added and the tubes incubated at 37°C for 10 hours. Hind III buffer (2 
uL of 1 0x concentration), 1 3.5 U.L of 0.5M Tris HCI (pH 7.5), 1 .8 uL of 1 % Triton X100, 5.6 uL of H20, and 20 U of EcoR 

25 I were added to each tube and incubation continued for 2 hours at 37°C. Digests were extracted once with an equal 
volume of phenol, once with phenol/chloroform/isoamyl alcohol, and once with chloroform/isoamyl alcohol. After eth- 
anol precipitation, the pellet was resuspended in 1 0 uL of TE buffer (1 OmM Tris HCI (pH 8.0), 1 mM EDTA). Resuspended 
pellet 4 ul of was ligated overnight at 16°C with agarose gel-purified EcoRI/Hind III digested pBSKS* vector (1 ug) 
using T4 DNA ligase (100 units). One half of the transformation mixture was transformed by heat shock into DH5ct 

30 cells and 1 00 uL of the 1 .0 mL transformation mixture was plated on Luria Broth (LB) agar plates containing 70 ug/mL 
ampicillin. Plates were incubated overnight at 37°C. Ampicillin resistant colonies (6-12) were picked and grown over- 
night in LB media containing 70ug/mL ampicillin. Plasmid DNA was isolated from each culture by Wizard Minipreps 
(Promega Corporation, Madison Wl) and screened for the presence of the approximately 120 base pair insert by di- 
gestion with EcoRI and Hind III and running the digestion products on agarose electrophoresis gels. Clones with inserts 

35 were confirmed by standard dideoxy termination DNA sequencing. The correct clone was named pBSN4-1 . 

[0233] Oligos N4-3 (SEQ. ID. NO. 49) and N4-4 (SEQ. ID. NO. 50) (Figure 81 ) were synthesized, purified, annealed, 
extended, and cloned into pBSKS* following exactly the same procedure given above for oligos N4-1 and N4-2. The 
resulting plasmid was named pBSN4-2A. To clone together the sections of the collagen gene from pBSN4-1 and 
pBSN4-2A, plasmid pBSN4-1 (1u.g) was digested for 2 hours at 37°C with Apa L1 and Hind III. The digested vector 

40 was purified by agarose gel electrophoresis. Plasmid pBSN4-2A (3 ug) was digested for 2 hours at 37°C with Apa L1 
and Hind III and the insert purified by agarose gel electrophoresis. Apa L1/Hind Ill-digested pBSN4-1 was ligated with 
this insert overnight at 16°C with T4 DNA ligase. One half of the ligation mixture was transformed into DH5o cells and 
1/10 of the transformation mixture was plated on LB agar plates containing 70 ug/mL ampicillin. After overnight incu- 
bation at 37°C, ampicillin-resistant clones were picked and screened for the presence of insert DNA as described 

45 above. Clones were confirmed by dideoxy termination sequencing. The correct clone was named pBSN4-2. 

[0234] In a similar manner, the remainder of the gene for the carboxy terminal 21 9 amino acids of human Type I (a1 ) 
collagen with optimized E. coli codon usage was constructed such that the final DNA sequence is that given in Figure 
71 (SEQ. ID. NO: 43). 

[0235] It will be understood that various modifications may be made to the embodiments disclosed herein. For ex- 
50 ample, it is contemplated that any protein produced by prokaryotes and eukaryotes can be made to incorporate one 
or more amino acid analogs in accordance with the present disclosure. Therefore, the above description should not 
be construed as limiting, but merely as exemplifications of preferred embodiments. Those skilled in art will envision 
other modifications within the scope and spirit of the claims appended hereto. 
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Annex to the description 
[0236] 

5 

SEQUENCE LISTING 



10 



(1) GENERAL INFORMATION: 



(i) APPLICANT: GRUSKIN, ELLIOT A. 
,5 BUECHTER, DOUGLAS 

BROKAW, JANE 
ZHANG, GUANGHUI 

20 PAOLBLLA, DAVID 

(ii) TITLE OF INVENTION: AMINO ACID MODIFIED POLYPEPTIDES 

25 

(iii) NUMBER OF SEQUENCES: 50 



30 



35 



40 



45 



50 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DILWORTH & BARRESE 

(B) STREET: 333 EARLE OVINGTON BOULEVARD 

(C) CITY': UNIONDALE 

(D) STATE: NY 

(E) COUNTRY: U.S.A. 

(F) ZIP: 11553 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

55 (C) CLASSIFICATION: 
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5 



Y5 



20 



25 



30 



35 



40 



50 



(viii) ATTORNEY/ AGENT INFORMATION: 
(A) NAME: STEEN, JEFFREY S 



(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: (516) 228-8484 
f0 (B) TELEFAX: (516) 228-8516 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3170 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

CAGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC TGGCCCCATG 60 

GGTCCCTCTG GTCCTCGTGG TCTCCCTGGC CCCCCTGGTG CACCTGGTCC CCAAGGCTTC 120 

CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT GGAGCTTCAG GTCCCATGGG TCCCCGAGGT 180 

CCCCCAGGTC CCCCTGGAAA GAATGGAGAT GATGGGGAAG CTGGAAAACC TGGTCGTCCT 240 

45 GGTGAGCGTG GGCCTCCTGG GCCTCAGGGT GCTCGAGGAT TGCCCGGAAC AGCTGGCCTC 300 

CCTGGAATGA AGGGACACAG AGGTTTCAGT GGTTTGGATG GTGCCAAGGG AGATGCTGGT 360 

CCTGCTGGTC CTAAGGGTGA GCCTGGCAGC CCTGGTGAAA ATGGAGCTCC TGGTCAGATG 420 
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GGCCCCCGTG GCCTGCCTGG TGAGAGAGGT CGCCCTGGAG CCCCTGGCCC TGCTGGTGCT 480 

CGTGGAAATG ATGGTGCTAC TGGTGCTGCC GGGCCCCCTG GTCCCACCGG CCCCGCTGGT 540 

CCTCCTGGCT TCCCTGGTGC TGTTGGTGCT AAGGGTGAAG CTGGTCCCCA AGGGCCCCGA 600 

GGCTCTGAAG GTCCCCAGGG TGTGCGTGGT GAGCCTGGCC CCCCTGGCCC TGCTGGTGCT GSO 

GCTGGCCCTG CTGGAAACCC TGGTGCTGAT GGACAGCCTG GTGCTAAAGG TGCCAATGGT 720 

GCTCCTGGTA TTGCTGGTGC TCCTGGCTTC CCTGGTGCCC GAGGCCCCTC TGGACCCCAG 780 

20 GGCCCCGGCG GCCCTCCTGG TCCCAAGGGT AACAGCGGTG AACCTGGTGC TCCTGGCAGC 840 

AAAGGAGACA CTGGTGCTAA GGGAGAGCCT GGCCCTGTTG GTGTTCAAGG ACCCCCTGGC 900 

CCTGCTGGAG AGGAAGGAAA GCGAGGAGCT CGAGGTGAAC CCGGACCCAC TGGCCTGCCC 960 

GGACCCCCTG GCGAGCGTGG TGGACCTGGT AGCCGTGGTT TCCCTGGCGC AGATGGTGTT 1020 

GCTGGTCCCA AGGGTCCCGC TGGTGAACGT GGTTCTCCTG GCCCCGCTGG CCCCAAAGGA 1080 

TCTCCTGGTG AAGCTGGTCG TCCCGGTGAA GCTGGTCTGC CTGGTGCCAA GGGTCTGACT 1140 

GGAAGCCCTG GCAGCCCTGG TCCTGATGGC AAAACTGGCC CCCCTGGTCC CGCCGGTCAA 1200 

GATGGTCGCC CCGGACCCCC AGGCCCACCT GGTGCCCGTG GTCAGGCTGG TGTGATGGGA 1260 

TTCCCTGGAC CTAAAGGTGC TGCTGGAGAQ CCCGGCAAGG CTGGAGAGCG AGGTGTTCCC 1320 

GGACCCCCTG GCGCTGTCGG TCCTGCTGGC AAAGATGGAG AGGCTGGAGC TCAGGGACCC 1380 

CCTGGCCCTG CTGGTCCCGC TGGCGAGAGA GGTGAACAAG GCCCTGCTGG CTCCCCCGGA 1440 

50 
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TTCCAGGGTC TCCCTGGTCC TGCTGGTCCT CCAGGTGAAG CAGGCAAACC TGGTGAACAG 1500 

5 GGTGTTCCTG GAGACCTTGG CGCCCCTGGC CCCTCTGGAG CAAGAGGCGA GAGAGGTTTC 1560 

CCTGGCGAGC GTGGTGTGCA AGGTCCCCCT GGTCCTGCTG GACCCCGAGG GGCCAACGGT 1620 

10 

GCTCCCGGCA ACGATGGTGC TAAGGGTGAT GCTGGTGCCC CTGGAGCTCC CGGTAGCCAG 1680 

GGCGCCCCTG GCCTTCAGGG AATGCCTGGT GAACGTGGTG CAGCTGGTCT TCCAGGGCCT 1740 

AAGGGTGACA GAGGTGATGC TGGTCCCAAA GGTGCTGATG GCTCTCCTGG CAAAGATGGC 1800 

20 GTCCGTGGTC TGACCGGCCC CATTGGTCCT CCTGGCCCTG CTGGTGCCCC TGGTGACAAG 1-860 

GGTGAAAGTG GTCCCAGCGG CCCTGCTGGT CCCACTGGAG CTCGTGGTGC CCCCGGAGAC 1920 

25 

CGTGGTGAGC CTGGTCCCCC CGGCCCTGCT GGCTTTGCTG GCCCCCCTGG TGCTGACGGC 1980 
30 CAACCTGGTG CTAAAGGCGA ACCTGGTGAT GCTGGTGCCA AAGGCGATGC TGGTCCCCCT ' 2040 

GGGCCTGCCG GACCCGCTGG ACCCCCTGGC CCCATTGGTA ATGTTGGTGC TCCTGGAGCC 2100 

35 

AAAGGTGCTC GGGCAGCGCT GGTCCCCCTG GTGCTACTGG TTTCCCTGGT GCTGCTGGCC 2160 

GAGTCGGTCC TCCTGGCCCC TCTGGAAATG CTGGACCCCC TGGCCCTCCT GGTCCTGCTG 2220 
40 I 

GCAAAGAAGG CGGCAAAGGT CCCCGTGGTG AGACTGGCCC TGCTGGACGT CCTGGTGAAG 2280 

TTGGTCCCCC TGGTCCCCCT GGCCCTGCTG GCGAGAAAGG ATCCCCTGGT GCTGATGGTC 2340 

45 

CTGCTGGTGC TCCTGGTACT CCCGGGCCTC AAGGTATTGC TGGACAGCGT GGTGTGGTCG 2400 

50 

GCCTGCCTGG TCAGAGAGGA GAGAGAGGCT TCCCTGGTCT TCCTGGCCCC TCTGGTGAAC 2460 
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CTGGCAAACA AGGTCCCTCT GGAGCAAGTG GTGAACGTGG TCCCCCCGGT CCCATGGGCC 2520 

CCCCTGGATT GGCTGGACCC CCTGGTGAAT CTGGACGTGA GGGGGCTCCT GCTGCCGAAG 2580 

GTTCCCCTGG ACGAGACGGT TCTCCTGGCG CCAAGGGTGA CCGTGGTGAG ACCGGCCCCG 2640 

CTGGACCCCC TGGTGCTCCT GGTGCTCCTG GTGCCCCTGG CCCCGTTGGC CCTGCTGGCA 2700 

AGAGTGGTGA TCGTGGTGAG ACTGGTCCTG CTGGTCCCGC CGGTCCCGTC GGCCCCGCTG 27S0 

GCGCCCGTGG CCCCGCCGGA CCCCAAGGCC CCCGTGGTGA CAAGGGTGAG ACAGGCGAAC 2820 

20 AGGGCGACAG AGGCATAAAG GGTCACCGTG GCTTCTCTGG CCTCCAGGGT CCCCCTGGCC 2880 

CTCCTGGCTC TCCTGGTGAA CAAGGTCCCT CTGGAGCCTC TGGTCCTGCT GGTCCCCGAG 2940 



10 
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GTCCCCCTGG CTCTGCTGGT GCTCCTGGCA AAGATGGACT CAACGGTCTC CCTGGCCCCA 3000 



TTGGGCCCCC TGGTCCTCGC GGTCGCACTG GTGATGCTGG TCCTGTTGGT CCCCCCGGCC 3060 



CTCCTGGACC TCCTGGTCCC CCTGGTCCTC CCAGCGCTGG TTTCGACTTC AGCTTCCTCC 3120 



CCCAGCCACC TCAAGAGAAG GCTCACGATG GTGGCCGCTA CTACCGGGCT 3170 



(2) INFORMATION FOR SEQ ID NO:2.; 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 base pairs 

(B) TYPE: nucleic acid 
« (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii> MOLECULE TYPE: cDNA 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
CAGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC TGGCCCCATG 
GGTCCCTCTG GTCCTCGTGG TCTCCCTGGC CCCCCTGGTG CACCTGGTCC CCAAGGCTTC 
CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT GGAGCTTCAG GTCCCATGGG TCCCCGAGGT 
CCCCCAGGTC CCCCTGGAAA GAATGGAGAT GATGGGGAAG CTGGAAAACC TGGTCGTCCT 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
GGATCCATGG GGCTCGCTGG CCCACCGGGC 6AACCGGGTC CGCCAGGCCC GAAAGGTCCG 
CGTGGCGATA GCGGGCTCCC GGGCGATTCC TAATGGATCC 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
. (D) TOPOLOGY: unknown 



15 (ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
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Gly Leu Ala Gly Pro Pro Gly Glu Pro Gly Pro Pro Gly Pro Lys Gly 
15 10 15 

Pro Arg Gly Asp Ser 
20 

(2) INFORMATION FOR SEQ ID N0:5: 



(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 330 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ IDNO:5: 
CAGCGGGCCA GGAAGAAGAA TAAGAACTGC CGGCGCCACT CGCTCTATGT GGACTTCAGC 60 
GATGTGGGCT GGAATGACTG GATTGTGGCC CCACCAGGCT ACCAGGCCTT CTACTGCCAT 120 
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GGGGACTGCC CCTTTCCACT GGCTGACCAC CTCAACTCAA CCAACCATGC CATTGTGCAG 180 

ACCCTGGTCA ATTCTGTCAA TTCCAGTATC CCCAAAGCCT GTTGTGTGCC CACTGAACTG 240 

AGTGCCATCT CCATGCTGTA CCTGGATGAG TATGATAAGG TGGTACTGAA AAATTATCAG 300 

GAGATGGTAG TAGAGGGATG TGGGTGCCGC 330 
15 (2) INFORMATION FOR SEQ ID NO: 6: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1169 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 



20 



25 



(ii) MOLECULE TYPE: peptide 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
35 15 10 IS 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 

40 

Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 



45 
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55 



Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 

Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 
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Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 



10 



15 



20 



25 



Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
1G5 170 175 



30 



Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 



35 



Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 



40 



Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 



45 



Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 



50 



Ala Pro Gly lie Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 



55 



44 



EP 0 992 586 A2 



10 



15 



20 



25 



30 



35 



40 



45 



50 



Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 

Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 

Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 

Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 

Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 335 

Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 

Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 

Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 380 

Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
385 390 395 400 

Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 

Gly Val Met Gly Phe Pro Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly 
420 425 430 
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Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Gly Ala Val Gly Pro 
435 440 445 

Ala Gly Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 

Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 

Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 

Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 

Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 



Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 
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Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 

625 630 635 640 



Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

pro Gly Pro lie Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 720 

Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 

Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 

Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 

Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 
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Pro Gly Thr Pro Gly Pro Gin Gly He Ala Gly Gin Arg Gly Val Val 
7B5 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 

Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 825 • 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Ala Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Xaa Gly Ala Xaa Gly Ala Pro Gly Pro Val 
BB5 890 895 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 

Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 . 940 

Gly He Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 
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Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 

Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
980 985 . 990 

Gly Leu Asn Gly Leu Pro Gly Pro lie Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 

Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 

Ala Arg Ser Gin Arg Ala Arg Lys Lys Asn Lys Asn Cys Arg Arg His 
1060 1065 1070 

Ser Leu Tyr Val Asp Phe Ser Asp Val Gly Trp Asn Asp Trp lie Val 
1075 1080 1085 

Ala Pro Pro Gly Tyr Gin Ala Phe Tyr Cys His Gly Asp Cys Pro Phe 
1090 1095 1100 

Pro Leu Ala Asp His Leu Asn Ser Thr Asn His Ala lie Val Gin Thr 
1105 1110 1115 1120 

Leu Val Asn Ser Val Asn Ser Ser He Pro Lys Ala Cys Cys Val Pro 
1125 1130 1135 
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Thr Glu Leu Ser Ala He Ser Met Leu Tyr Leu Asp Glu Tyr Asp Lys 
1140 1145 1150 

5 

Val Val Leu Lys Asn Tyr Gin Glu Met Val Val Glu Gly Cys Gly Cys 
1155 1160 1165 

10 

Arg 

15 

(2) INFORMATION FOR SEQID NO: 7: 

(i) SEQUENCB CHARACTERISTICS: 

20 

(A) LENGTH: 3531 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULB TYPE: cDNA 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
35 GGGAAGGATT TCCATTTCCC AGCTGTCTTA TGGCTATGAT GAGAAATCAA CCGGAGGAAT GO 

TTCCGTGCCT GGCCCCATGG GTCCCTCTGG TCCTCGTGGT CTCCCTGGCC CCCCTGGTGC 120 

40 

ACCTGGTCCC CAAGGCTTCC AAGGTCCCCC TGGTGAGCCT GGCGAGCCTG GAGCTTCAGG 180 
TCCCATGGGT CCCCGAGGTC CCCCAGGTCC CCCTGGAAAG AATGGAGATG ATGGGGAAGC 240 

45 

TGGAAAACCT GGTCGTCCTG GTGAGCGTGG GCCTCCTGGG CCTCAGGGTG CTCGAGGATT 300 
OT GCCCGGAACA GCTGGCCTCC CTGGAATGAA GGGACACAGA GGTTTCAGTG GTTTGGATGG 3S0 

TGCCAAGGGA GATGCTGGTC CTGCTGGTCC TAAGGGTGAG CCTGGCAGCC CTGGTGAAAA 420 

55 

) 
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TGGAGCTCCT GGTCAGATGG GCCCCCGTGG CCTGCCTGGT GAGAGAGGTC GCCCTGGAGC 4 80 

CCCTGGCCCT GCTGGTGCTC GTGGAAATGA TGGTGCTACT GGTGCTGCCG GGCCCCCTGG 540 

TCCCACCGGC CCCGCTGGTC CTCCTGGCTT CCCTGGTGCT GTTGGTGCTA AGGGTGAAGC 600 

TGGTCCCCAA GGGCCCCGAG GCTCTGAAGG TCCCCAGGGT GTGCGTGGTG AGCCTGGCCC 660 

CCCTGGCCCT GCTGGTGCTG CTGGCCCTGC TGGAAACCCT GGTGCTGATG GACAGCCTGG 720 

TGCTAAAGGT GCCAATGGTG CTCCTGGTAT TGCTGGTGCT CCTGGCTTCC CTGGTGCCCG 780 

AGGCCCCTCT GGACCCCAGG GCCCCGGCGG CCCTCCTGGT CCCAAGGGTA ACAGCGGTGA 840 

ACCTGGTGCT CCTGGCAGCA AAGGAGACAC TGGTGCTAAG GGAGAGCCTG GCCCTGTTGG 900 

TGTTCAAGGA CCCCCTGGCC CTGCTGGAGA GGAAGGAAAG CGAGGAGCTC GAGGTGAACC 960 

CGGACCCACT GGCCTGCCCG GACCCCCTGG CGAGCGTGGT GGACCTGGTA GCCGTGGTTT 1020 

CCCTGGCGCA GATGGTGTTG CTGGTCCCAA GGGTCCCGCT GGTGAACGTG GTTCTCCTGG 1080 

CCCCGCTGGC CCCAAAGGAT CTCCTGGTGA AGCTGGTCGT CCCGGTGAAG CTGGTCTGCC 1140 

TGGTGCCAAG GGTCTGACTG GAAGCCCTGG CAGCCCTGGT CCTGATGGCA AAACTGGCCC 1200 

CCCTGGTCCC GCCGGTCAAG ATGGTCGCCC CGGACCCCCA GGCCCACCTG GTGCCCGTGG 1260 

TCAGGCTGGT GTGATGGGAT TCCCTGGACC TAAAGGTGCT GCTGGAGAGC CCGGCAAGGC 1320 

TGGAGAGCGA GGTGTTCCCG GACCCCCTGG CGCTGTCGGT CCTGCTGGCA AAGATGGAGA 1380 

GGCTGGAGCT CAGGGACCCC CTGGCCCTGC TGGTCCCGCT GGCGAGAGAG GTGAACAAGG 1440 
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CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT CCCTGGTCCT GCTGGTCCTC CAGGTGAAGC 1500 

AGGCAAACCT GGTGAACAGG GTGTTCCTGG AGACCTTGGC GCCCCTGGCC CCTCTGGAGC 1560 

AAGAGGCGAG AGAGGTTTCC CTGGCGAGCG TGGTGTGCAA GGTCCCCCTG GTCCTGCTGG 1G20 

ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CGATGGTGCT AAGGGTGATG CTGGTGCCCC 1680 

15 TGGAGCTCCC GGTAGCCAGG GCGCCCCTGG CCTTCAGGGA ATGCCTGGTG AACGTGGTGC 1740 

AGCTGGTCTT CCAGGGCCTA AGGGTGACAG AGGTGATGCT GGTCCCAAAG GTGCTGATGG 1800 

CTCTCCTGGC AAAGATGGCG TCCGTGGTCT GACCGGCCCC ATTGGTCCTC CTGGCCCTGC I860 

TGGTGCCCCT GGTGACAAGG GTGAAAGTGG TCCCAGCGGC CCTGCTGGTC CCACTGGAGC 1920 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTG GCTTTGCTGG 1980 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC TAAAGGCGAA CCTGGTGATG CTGGTGCCAA 2040 

AGGCGATGGG TCCCCCTGGG CCTGCCGGAC CCGCTGGACC CCCTGGCCCC ATTGGTAATG 2100 

TTGGTGCTCC TGGAGCCAAA GGTGCTCGCG GCAGCGCTGG TCCCCCTGGT GCTACTGGTT 2160 

TCCCTGGTGC TGCTGGCCGA GTCGGTCCTC CTGGCCCCTC TGGAAATGCT GGACCCCCTG 2220 

GCCCTCCTGG TCCTGCTGGC AAAGAAGGCG GCAAAGGTCC CCGTGGTGAG ACTGGCCCTG 2280 

CTGGACGTCC TGGTGAAGTT GGTCCCCCTG GTCCCCCTGG CCCTGCTGGC GAGAAAGGAT 2340 

CCCCTGGTGC TGATGGTCCT GCTGGTGCTC CTGGTACTCC CGGGCCTCAA GGTATTGCTG 2400 

50 GACAGCGTGG TGTGGTCGGC CTGCCTGGTC AGAGAGGAGA GAGAGGCTTC CCTGGTCTTC 2460 
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CTGGCCCCTC TGGTGAACCT GGCAAACAAG GTCCCTCTGG AGCAAGTGGT GAACGTGGTC 2520 

5 

CCCCCGGTCC CATGGGCCCC CCTGGATTGG CTGGACCCCC TGGTGAATCT GGACGTGAGG .2580 

GGGCTCCTGC TGCCGAAGGT TCCCCTGGAC GAGACGGTTC TCCTGGCGCC AAGGGTGACC 2G40 

10 

GTGGTGAGAC CGGCCCCGCT GGACCCCCTG GTGCTCTGGT GCTCTGGTGC CCCTGGCCCC 2700 

J5 GTTGGCCCTG CTGGCAAGAG TGGTGATCGT GGTGAGACTG GTCCTGCTGG TCCCGCCGGT 2760 

CCCGTCGGCC CCGCTGGCGC CCGTGGCCCC GCCGGACCCC AAGGCCCCCG TGGTGACAAG 2820 

20 GGTGAGACAG GCGAACAGGG CGACAGAGGC ATAAAGGGTC ACCGTGGCTT CTCTGGCCTC 2880 

CAGGGTCCCC CTGGCCCTCC TGGCTCTCCT GGTGAACAAG GTCCCTCTGG AGCCTCTGGT 2940 

25 

CCTGCTGGTC CCCGAGGTCC CCCTGGCTCT GCTGGTGCTC CTGGCAAAGA TGGACTCAAC 3000 

GGTCTCCCTG GCCCCATTGG GCCCCCTGGT CCTCGCGGTC GCACTGGTGA TGCTGGTCCT 3060 

30 

GTTGGTCCCC CCGGCCCTCC TGGACCTCCT GGTCCCCCTG GTCCTCCCAG CGCTGGTTTC 3120 

35 GACTTCAGCT TCCTCCCCCA GCCACCTCAA GAGAAGGCTC ACGATGGTGG CCGCTACTAC 3180 

CGGGCTAGAT CCCAGCGGGC CAGGAAGAAG AATAAGAACT GCCGGCGCCA CTCGCTCTAT 3240 

40 

GTGGACTTCA GCGATGTGGG CTGGAATGAC TGGATTGTGG CCCCACCAGG CTACCAGGCC 3300 

TTCTACTGCC ATGGGGACTG CCCCTTTCCA CTGGCTGACC ACCTCAACTC AACCAACCAT 3360 

45 . 

GCCATTGTGC AGACCCTGGT CAATTCTGTC AATTCCAGTA TCCCCAAAGC CTGTTGTGTG 3420 

CCCACTGAAC TGAGTGCCAT CTCCATGCTG TACCTGGATG AGTATGATAA GGTGGTACTG 3480 

50 
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AAAAATTATC AGGAGATGGT AGTAGAGGGA TGTGGGTGCC GCTAAAAGCT T 3531 

5 

(2) INFORMATION FOR SEQ ID NO: 8: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1171 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 



15 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
1 5 10 15 



30 Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 

20 25 30 
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Gly Ala Pro Gly Pro Gin Gly Pbe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 

Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 



Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
50 85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

55 
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Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
14S 150 155 ISO 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 

Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 

Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 

Ala Pro Gly lie Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 

Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 

Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 
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Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 

Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 



Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 335 

Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 

Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 

Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 380 

Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
335 390 395 400 

Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 

Gly Val Met Gly Phe Pro Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly 
420 425 430 

Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Gly Ala Val Gly Pro 
435 ■ 440 445 

Ala Gly Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 
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Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 

Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 



Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 550 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 

Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 

Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 

Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 
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Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 



Ala Lys Gly Asp Ala Gly Pro . Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

Pro Gly Pro lie Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 720 

Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 

Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 

Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 

Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 

Pro Gly Thr Pro Gly Pro Gin Gly lie Ala Gly Gin Arg Gly Val Val 
785 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 
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Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 625 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 
885 890 895 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 

Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 

Gly lie Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 

Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 



Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
980 985 990 
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Gly Leu Asn Gly Leu Pro Gly Pro lie Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 

Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 

Ala Arg Ser Ala Leu Asp Thr Asn Tyr Cys Phe Ser Ser Thr Glu Lys 
1060 1065 1070 

Asn Cys Cys Val Arg Gin Leu Tyr He Asp Phe Arg Lys Asp Leu Gly 
1075 1080 1085 

Trp Lys Trp lie His Glu Pro Lys Gly Tyr His Ala Asn Phe Cys Leu 
1090 1095 1100 

Gly Pro Cys Pro Tyr He Trp Ser Leu Asp Thr Gin Tyr Ser Lys Val 
1105 1110 1115 1120 

Leu Ala Leu Tyr Asn Gin His Asn Pro Gly Ala Ser Ala Ala Pro Cys 
1125 1130 1135 

Cys Val Pro Gin Ala Leu Glu Pro Leu Pro lie Val Tyr Tyr Val Gly 
1140 1145 1150 

Arg Lys Pro Lys Val Glu Gin Leu Ser Asn Met He Val Arg Ser Cys 
1155 1160 1165 
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Lys Cys Ser 
1170 

5 

(2) INFORMATION FOR SEQ ID NO: 9: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3541 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

15 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 

20 

(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
25 GGGAAGGATT TCCATTTCCC AGCTGTCTTA TGGCTATGAT GAGAAATCAA CCGGAGGAAT 60 
TTCCGTGCCT GGCCCCATGG GTCCCTCTGG TCCTCGTGGT CTCCCTGGCC CCCCTGGTGC 120 

30 

ACCTGGTCCC CAAGGCTTCC AAGGTCCCCC TGGTGAGCCT GGCGAGCCTG GAGCTTCAGG 180 
TCCCATGGGT CCCCGAGGTC CCCCAGGTCC CCCTGGAAAG AATGGAGATG ATGGGGAAGC 240 

35 

TGGAAAACCT GGTCGTCCTG GTGAGCGTGG GCCTCCTGGG CCTCAGGGTG CTCGAGGATT 300 
40 GCCCGGAACA GCTGGCCTCC CTGGAATGAA GGGACACAGA GGTTTCAGTG GTTTGGATGG 3 SO 

TGCCAAGGGA GATGCTGGTC CTGCTGGTCC TAAGGGTGAG CCTGGCAGCC CTGGTGAAAA 420 

45 

TGGAGCTCCT GGTCAGATGG GCCCCCGTGG CCTGCCTGGT GAGAGAGGTC GCCCTGGAGC 4 BO 

CCCTGGCCCT GCTGGTGCTC GTGGAAATGA TGGTGCTACT GGTGCTGCCG GGCCCCCTGG 540 

50 

TCCCACCGGC CCCGCTGGTC CTCCTGGCTT CCCTGGTGCT GTTGGTGCTA AGGGTGAAGC 600 

55 - 



61 



5 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



EP 0 992 586 A2 

TGGTCCCCAA GGGCCCCGAG GCTCTGAAGG TCCCCAGGGT GTGCGTGGTG AGCCTGGCCC 660 

CCCTGGCCCT GCTGGTGCTG CTGGCCCTGC TGGAAACCCT GGTGCTGATG GACAGCCTGG 720 

TGCTAAAGGT GCCAATGGTG CTCCTGGTAT TGCTGGTGCT CCTGGCTTCC CTGGTGCCCG 780 

AGGCCCCTCT GGACCCCAGG GCCCCGGCGG CCCTCCTGGT CCCAAGGGTA ACAGCGGTGA 840 

ACCTGGTGCT CCTGGCAGCA AAGGAGACAC TGGTGCTAAG GGAGAGCCTG GCCCTGTTGG 900 

TGTTCAAGGA CCCCCTGGCC CTGCTGGAGA GGAAGGAAAG CGAGGAGCTC GAGGTGAACC 960 

CGGACCCACT GGCCTGCCCG GACCCCCTGG CGAGCGTGGT GGACCTGGTA GCCGTGGTTT 1020 

CCCTGGCGCA GATGGTGTTG CTGGTCCCAA GGGTCCCGCT GGTGAACGTG GTTCTCCTGG 1080 

CCCCGCTGGC CCCAAAGGAT CTCCTGGTGA AGCTGGTCGT CCCGGTGAAG CTGGTCTGCC 1140 

TGGTGCCAAG GGTCTGACTG GAAGCCCTGG CAGCCCTGGT CCTGATGGCA AAACTGGCCC 1200 

CCCTGGTCCC GCCGGTCAAG ATGGTCGCCC CGGACCCCCA GGCCCACCTG GTGCCCGTGG 1260 

TCAGGCTGGT GTGATGGGAT TCCCTGGACC TAAAGGTGCT GCTGGAGAGC CCGGCAAGGC 1320 

TGGAGAGCGA GGTGTTCCCG GACCCCCTGG CGCTGTCGGT CCTGCTGGCA AAGATGGAGA 1380 

GGCTGGAGCT CAGGGACCCC CTGGCCCTGC TGGTCCCGCT GGCGAGAGAG GTGAACAAGG 1440 

CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT CCCTGGTCCT GCTGGTCCTC CAGGTGAAGC 1500 

AGGCAAACCT GGTGAACAGG GTGTTCCTGG AGACCTTGGC GCCCCTGGCC CCTCTGGAGC 1560 

AAGAGGCGAG AGAGGTTTCC CTGGCGAGCG TGGTGTGCAA GGTCCCCCTG GTCCTGCTGG 1620 
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ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CGATGGTGCT AAGGGTGATG CTGGTGCCCC 1680 

5 

TGGAGCTCCC GGTAGCCAGG GCGCCCCTGG CCTTCAGGGA ATGCCTGGTG AACGTGGTGC 1740 

AGCTGGTCTT CCAGGGCCTA AGGGTGACAG AGGTGATGCT GGTCCCAAAG GTGCTGATGG 1800 

10 

CTCTCCTGGC AAAGATGGCG TCCGTGGTCT GACCGGCCCC ATTGGTCCTC CTGGCCCTGC 1860 

15 TGGTGCCCCT GGTGACAAGG GTGAAAGTGG TCCCAGCGGC CCTGCTGGTC CCACTGGAGC 1920 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTG GCTTTGCTGQ 1980 

20 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC TAAAGGCGAA CCTGGTGATG CTGGTGCCAA 2040 

AGGCGATGCT GGTCCCCCTG GGCCTGCCGG ACCCGCTGGA CCCCCTGGCC CCATTGGTAA 2100 

25 

TGTTGGTGCT CCTGGAGCCA AAGGTGCTCG CGGCAGCGCT GGTCCCCCTG GTGCTACTGG 2160 

30 TTTCCCTGGT GCTGCTGGCC GAGTCGGTCC TCCTGGCCCC TCTGGAAATG CTGGACCCCC 2220 

TGGCCCTCCT GGTCCTGCTG GCAAAGAAGG CGGCAAAGGT CCCCGTGGTG AGACTGGCCC 2280 

35 

TGCTGGACGT CCTGGTGAAG TTGGTCCCCC TGGTCCCCCT GGCCCTGCTG GCGAGAAAGG 2340 

ATCCCCTGGT GCTGATGGTC CTGCTGGTGC TCCTGGTACT CCCGGGCCTC AAGGTATTGC 2400 

40 

TGGACAGCGT GGTGTGGTCG GCCTGCCTGG TCAGAGAGGA GAGAGAGGCT TCCCTGGTCT 2460 

TCCTGGCCCC TCTGGTGAAC CTGGCAAACA AGGTCCCTCT GGAGCAAGTG GTGAACGTGG 2520 

45 

TCCCCCCGGT CCCATGGGCC CCCCTGGATT GGCTGGACCC CCTGGTGAAT CTGGACGTGA 2580 

50 GGGGGCTCCT GCTGCCGAAG GTTCCCCTGG ACGAGACGGT TCTCCTGGCG CCAAGGGTGA 2640 

55 
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CCGTGGTGAG ACCGGCCCCG CTGGACCCCC TGGTGCTCCT GGTGCTCCTG GTGCCCCTGG 2700 

5 

CCCCGTTGGC CCTGCTGGCA AGAGTGGTGA TCGTGGTGAG ACTGGTCCTG CTGGTCCCGC 2760 

CGGTCCCGTC GGCCCCGCTG GCGCCCGTGG CCCCGCCGGA CCCCAAGGCC CCCGTGGTGA 2820 

10 

CAAGGGTGAG ACAGGCGAAC AGGGCGACAG AGGCATAAAG GGTCACCGTG GCTTCTCTGG 2880 

15 CCTCCAGGGT CCCCCTGGCC CTCCTGGCTC TCCTGGTGAA CAAGGTCCCT CTGGAGCCTC 2940 

TGGTCCTGCT GGTCCCCGAG GTCCCCCTGG CTCTGCTGGT GCTCCTGGCA AAGATGGACT 3000 

20 

CAACGGTCTC CCTGGCCCCA TTGGGCCCCC TGGTCCTCGC GGTCGCACTG GTGATGCTGG 3060 

TCCTGTTGGT CCCCCCGGCC CTCCTGGACC TCCTGGTCCC CCTGGTCCTC CCAGCGCTGG 3120 

25 

TTTCGACTTC AGCTTCCTCC CCCAGCCACC TCAAGAGAAG GCTCACGATG GTGGCCGCTA 3180 

3Q CTACCGGGCT AGATCTGCCC TGGACACCAA CTATTGCTTC AGCTCCACGG AGAAGAACTG 3240 

CTGCGTGCGG CAGCTGTACA TTGACTTCCG CAAGGACCTC GGCTGGAAGT GGATCCACGA 3300 

35 GCCCAAGGGC TACCATGCCA ACTTCTGCCT CGGGCCCTGC CCCTACATTT GGAGCCTGGA 3360 

CACGCAGTAC AGCAAGGTCC TGGCCCTGTA CAACCAGCAT AACCCGGGCG CCTCGGCGGC 3420 

40 

GCCGTGCTGC GTGCCGCAGG CGCTGGAGCC GCTGCCCATC GTGTACTACG TGGGCCGCAA 3480 

GCCCAAGGTG GAGCAGCTGT CCAACATGAT CGTGCGCTCC TGCAAGTGCA GCTGATCTAG 3540 

45 

A 3541 

50 

55 . 
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10 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1388 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : unknown 



f5 (ix) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



20 



25 



30 



35 



Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
1 5 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 

Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 



Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
40 65 70 75 80 

Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

50 



55 
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10 



15 



20 



25 



30. 



35 



40 



45 



50 



Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 . 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155. 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 IBS 190 

Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 

Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 

Ala Pro Gly He Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 

Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 

Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 
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Glu Pro Gly Pro Val 
290 

Glu Gly Lys Arg Gly 
305 

Gly Pro Pro Gly Glu 
325 

Ala Asp Gly Val Ala 
340 

Pro Gly Pro Ala Gly 
355 

Gly Glu Ala Gly Leu 
370 

Ser Pro Gly Pro Asp 
3B5 

Asp Gly Arg Pro Gly 
405 

Gly Val Met Gly Phe 
420 

Lys Ala Gly Glu Arg 
435 

Ala Gly Lys Asp Gly 
450 



Gly Val Gin Gly Pro Pro 
295 

Ala Arg Gly Glu Pro Gly 
310 315 

Arg Gly Gly Pro Gly Ser 
. 330 

Gly Pro Lys Gly Pro Ala 
345 

Pro Lys Gly Ser Pro Gly 
360 

Pro Gly Ala Lys Gly Leu 
375 

Gly Lys Thr Gly Pro Pro 
390 395 

Pro Pro Gly Pro Pro Gly 
410 

Pro Gly Pro Lys Gly Ala 
425 

Gly Val Pro Gly Pro Pro 
440 

Glu Ala Gly Ala Gin Gly 
455 



Gly Pro Ala Gly Glu 
300 

Pro Thr Gly Leu Pro 
320 

Arg Gly Phe Pro Gly 
335 

Gly Glu Arg Gly Ser 
350 

Glu Ala Gly Arg Pro 
365 

ThT Gly Ser Pro Gly 
380 

Gly Pro Ala Gly Gin 
400 

Ala Arg Gly Gin Ala 
415 

Ala Gly Glu Pro Gly 
430 

Gly Ala Val Gly Pro 
445 

Pro Pro Gly Pro Ala 
460 
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Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 

Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
4B5 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 

Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 5G0 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 

Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 

Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 

Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 



68 



EP 0 992 586 A2 



Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 



Pro Gly Pro He Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
70S 710 715 720 

Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 



Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 

740 745 750 

Gly Pro Ala Gly Arg Pro Giy Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 

Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 

Pro Gly Thr Pro Gly Pro Gin Gly He Ala Gly Gin Arg Gly Val Val 
785 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 
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pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
B20 825 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 
885 890 895 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 

Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 

Gly He Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 

Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 

Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
980 985 990 
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Gly Leu Asn Gly Leu Pro Gly Pro He Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
2.025 1030 1035 1040 

Pro Gin Pro Pro Gin Glu Lya Ala Hi3 Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 

Ala Arg Ser Asp Glu Ala Ser Gly He Gly Pro Glu Val Pro Asp Asp 
1060 1065 1070 

Arg Asp Phe Glu Pro Ser Leu Gly Pro Val Cys Pro Phe Arg Cys Gin 
1075 1080 1085 

Cys His Leu Arg Val Val Gin Cys Ser Asp Leu Gly Leu Asp Lys Val 
1090 1095 1100 

Pro Lys Asp Leu Pro Pro Asp Thr Thr Leu Leu Asp Leu Gin Asn Asn 
1105 H10 1115 . 1120 

Lys He Thr Glu He Lys Asp Gly Asp Phe Lys Asn Leu Lys Asn Leu 
1125 1130 1135 

His Ala Leu He Leu Val Asn Asn Lys He Ser Lys Val Ser Pro Gly 
1140 1145 1150 

Ala Phe Thr Pro Leu Val Lys Leu Glu Arg Leu Tyr Leu Ser Lys Asn 
1155 1160 1165 
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Gin Leu Lys Glu Leu Pro Glu Lys Met Pro Lys Thr Leu Gin Glu Leu 
1170 1175 1180 

Arg Ala His Glu Asn Glu lie. Thr Lys Val Arg Lys Val Thr Phe Asn 
1185 1190 1195 1200 

Gly Leu Asn Gin Met lie Val lie Glu Leu Gly Thr Asn Pro Leu Lys 
1205 1210 1215 

Ser Ser Gly lie Glu Asn Gly Ala Phe Gin Gly Met Lys Lys Leu Ser 
1220 1225 1230 

Tyr lie Arg lie Ala Asp Thr Asn lie Thr Ser lie Pro Gin Gly Leu 
1235 1240 1245 

Pro Pro Ser Leu Thr Glu Leu His Leu Asp Gly Asn Lys lie Ser Arg 
1250 1255 1260 

Val Asp Ala Ala Ser Leu Lys Gly Leu Asn Asn Leu Ala Lys Leu Gly 
1265 1270 1275 1280 

Leu Ser Phe Asn Ser lie Ser Ala Val Asp Asn Gly Ser Leu Ala Asn 
12B5 1290 1295 

Thr Pro His Leu Arg Glu Leu His Leu Asp Asn Asn Lys Leu Thr Arg 
1300 1305 1310 

Val Pro Gly Gly Leu Ala Glu His Lys Tyr lie Gin Val Val Tyr Leu 
1315 1320 1325 

His Asn Asn Asn lie Ser Val Val Gly Ser Ser Asp Phe Cys Pro Pro 
1330 1335 1340 
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10 



15 



20 



30 



35 



40 



Gly His Asn Thr Lys Lys Ala Ser Tyr Ser Gly Val Ser Leu Phe Ser 
1345 1350 1355 1360 

Asn Pro Val Gin Tyr Trp Glu He Gin Pro Ser Thr Phe Arg Cys Val 
1365 1370 1375 

Tyr Val Arg Ser Ala He Gin Leu Gly Asn Tyr Lys 
1380 1385 

(2) INFORMATION FOR SEQ ID NO: 11: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1107 amino acids 

(B) TYPE: amino acid 

25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO.-ll: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly He Ser Val 
1 5 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 



45 Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 

35 40 45 



50 



Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 



55 
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Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 

Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 

Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 

O 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 . 

Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 
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Ala Pro Gly lie Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 



Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 2G5 270 



Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 

Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 

Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 

Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 335 

Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 

Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 

Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 380 

Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
385 390 395 400 



Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 
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Gly Val Met Gly Phe Pro Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly 
420 425 430 

Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Gly Ala Val Gly Pro 
435 440 445 

Ala Gly Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 

Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 



Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 

Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 

Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 
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Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
5 595 GOO 605 

Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
10 610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
« 625 630 635 640 



20 



25 



30 



35 



Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

Pro Gly Pro lie Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 720 



40 



Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 



45 



Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 



50 



Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 



55 
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Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 

Pro Gly Thr Pro Gly Pro Gin Gly lie Ala Gly Gin Arg Gly Val Val 
785 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 B10 815 

Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 825 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 
885 890 895 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 

Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 
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Gly He Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 

Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 

Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
980 985 990 

Gly Leu Asn Gly Leu Pro Gly Pro He Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 



Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 

Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 

Ala Arg Ser Pro Lys Asp Leu Pro Pro Asp Thr Thr Leu Leu Asp Leu 
1060 1065 1070 

Gin Asn Asn Lys He Thr Glu He Lys Asp Gly Asp Phe Lys Asn Leu 
1075 1080 1085 

Lys Asn Leu His Ala Leu He Leu Val Asn Asn Lys He Ser Lys Val 
1090 1095 1100 

Ser Pro Gly 
1105 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4167 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SI 
CAGCTGTCTT ATGGCTATGA TGAGAAATCA 
GGTCCCTCTG GTCCTCGTGG TCTCCCTGGC 
CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT 
CCCCCAGGTC CCCCTGGAAA GAATGGAGAT 
GGTGAGCGTG GGCCTCCTGC GCCTCAGGGT 
CCTGGAATGA AGGGACACAG AGGTTTCAGT 
CCTGCTGGTC CTAAGGGTSA GCCTGGCAGC 
GGCCCCCGTG GCCTGCCTGG TGAGAGAGGT 
CGTGGAAATG ATGGTGCTAC TGGTGCTGCC 
CCTCCTGGCT TCCCTGGTGC TGTTGGTGCT 



!Q ID NO: 12: 

ACCGGAGGAA TTTCCGTGCC TGGCCCCATG 
CCCCCTGGTG CACCTGGTCC CCAAGGCTTC 
GGAGCTTCAG GTCCCATGGG TCCCCGAGGT 
GATGGGGAAG CTGGAAAACC TGGTCGTCCT 
GCTCGAGGAT TGCCCGGAAC AGCTGGCCTC 
GGTTTGGATG GTGCCAAGGG AGATGCTGGT 
CCTGGTGAAA ATGGAGCTCC TGGTCAGATQ 
CGCCCTGGAG CCCCTGGCCC TGCTGGTGCT 
GGGCCCCCTG GTCCCACCGG CCCCGCTGGT 
AAGGGTGAAG CTGGTCCCCA AGGGCCCCGA 
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GGCTCTGAAG GTCCCCAGGG TGTGCGTGGT GAGCCTGGCC CCCCTGGCCC TGCTGGTGCT 660 

5 

GCTGGCCCT6 CTGGAAACCC TGGTGCTGAT GGACAGCCTG GTGCTAAAGG TGCCAATGGT 720 
,0 GCTCCTGGTA TTGCTGGTGC TCCTGGCTTC CCTGGTGCCC GAGGCCCCTC TGGACCCCAG 780 

GGCCCCGGCG GCCCTCCTGG TCCCAAGGGT AACAGCGGTG AACCTGGTGC TCCTGGCAGC B40 

15 

AAAGGAGACA CTGGTGCTAA GGGAGAGCCT GGCCCTGTTG GTGTTCAAGG ACCCCCTGGC 900 
CCTGCTGGAG AGCAAGGAAA GCGAGGAGCT CGAGGTGAAC CCGGACCCAC TGGCCTGCCC 960 

20 

GGACCCCCTG GCGAGCGTGG TGGACCTGGT AGCCGTGGTT TCCCTGGCGC AGATGGTGTT 1-020 
25 GCTGGTCCCA AGGGTCCCGC TGGTGAACGT GGTTCTCCTG GCCCCGCTGG CCCCAAAGGA 1080 

TCTCCTCGTG AAGCTGGTCG TCCCGGTGAA GCTGGTCTGC CTGGTGCCAA GGGTCTGACT 1140 

30 

GGAAGCCCTG GCAGCCCTGG TCCTGATGGC AAAACTGGCC CCCCTGGTCC CGCCGGTCAA 1200 
GATGGTCGCC CCGGACCCCC AGGCCCACCT GGTGCCCGTG GTCAGGCTGG TGTGATGGGA 1260 

35 

TTCCCTGGAC CTAAAGGTGC TGCTCGAGAG CCCGGCAAGG CTGGAGAGCG AGGTGTTCCC 1320 
40 GGACCCCCTC GCGCTGTCGG TCCTGCTGGC AAAGATGGAG AGGCTGGAGC TCAGGGACCC 1380 

CCTGGCCCTG CTGGTCCCGC TGGCGAGAGA GGTGAACAAG GCCCTGCTGG CTCCCCCGGA 1440 

45 

TTCCAGGGTC TCCCTGGTCC TGCTGGTCCT CCAGGTGAAG CAGGCAAACC TGGTGAACAG 1500 
GGTGTTCCTG GAGACCTTGG CGCCCCTGGC CCCTCTGGAG CAAGAGGCGA GAGAGGTTTC 1560 

50 

CCTGGCGAGC GTGGTGTGCA AGGTCCCCCT GGTCCTGCTG GACCCCGAGG GGCCAACGGT 1620 

55 
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GCTCCCGCCA ACGATGCTGC TAAGGGTGAT GCTGGTGCCC CTGGAGCTCC CGGTAGCCAG 1680 

5 

GGCGCCCCTG GCCTTCAGGG AATGCCTGGT GAACGTGGTG CAGCTGGTCT TCCAGGGCCT 1740 

AAGGGTGACA GAGGTGATGC TGGTCCCAAA GGTGCTGATG GCTCTCCTGG CAAAGATGGC 1800 

10 

GTCCGTGGTC TGACCGACCC CATTGGTCCT CCTGGCCCTG CTGGTGCCCC TGGTGACAAG 18 SO 

' 5 GGTGAAAGTG GTCCCAGCGG CCCTGCTGGT CCCACTGGAG CTCGTGGTGC CCCCGGAGAC 1920 

CGTGGTGAGC CTGGTCCCCC CGGCCCTGCT GGCTTTGCTG GCCCCCCTGG TGCTGACGGC 1980 

20 

CAACCTGGTG CTAAAGGCGA ACCTGGTGAT GCTGGTGCCA AAGGCGATGC TGGTCCCCCT 2040 

GGGCCTGCCG GACCCGCTGG ACCCCCTGGC CCCATTGGTA ATGTTGGTGC TCCTGGAGCC 2100 

25 

AAACGTGCTC GCGGCAGCGC TGGTCCCCCT GGTGCTACTG GTTTCCCTGG TGCTGCTGGC 2160 

30 CGAGTCGGTC CTCCTGGCCC CTCTGGAAAT GCTGGACCCC CTGGCCCTCC TGGTCCTGCT 2220 

GGCAAAGAAG GCGGCAAAGG TCCCCGTGGT GAGACTGGCC CTGCTGGACG TCCTGGTGAA 2280 

35 

GTTGGTCCCC CTGGTCCCCC TGGCCCTGCT GGCGAGAAAG GATCCCCTGG TGCTGATGGT 2340 

CCTGCTGGTG CTCCTGGTAC TCCCGGGCCT CAAGGTATTG CTGGACAGCG TGGTGTGGTC 2400 

AO 

GGCCTGCCTG GTCAGAGAGG AGAGAGAGGC TTCCCTGGTC TTCTTGGCCC CTCTGGTGAA 2460 

45 CCTGGCAAAC AAGGTCCCTC TGGAGCAAGT GGTGAACGTG GTCCCCCCGG TCCCATGGGC 2520 

CCCCCTGGAT TGGCTGGACC CCCTGGTGAA TCTGGACGTG AGGGGGCTCC TGCTGCCGAA 2580 

50 

GGTTCCCCTG GACGAGACGG TTCTCCTGGC GCCAAGGGTG ACCGTGGTGA GACCGGCCCC 2640 

55 
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GCTGGACCCC CTGGTGCTCC TGGTGCTCCT GGTGCCCCTG GCCCCGTTGG CCCTGCTGGC 2700 

5 

AAGAGTGGTG ATCGTGGTGA GACTGGTCCT GCTGGTCCCG CCGGTCCCGT CGGCCCCGCT 2760 

10 GGCGCCCGTG GCCCCGCCGG ACCCCAAGGC CCCCGTGGTG ACAAGGGTGA GACAGGCGAA 2820 

CAGGGCGACA GAGGCATAAA GGGTCACCGT GGCTTCTCTG GCCTCCAGGG TCCCCCTGGC 2880 

15 

CCTCCTGGCT CTCCTGGTGA ACAAGGTCCC TCTGGAGCCT CTGGTCCTGC TGGTCCCCGA 2940 

GGTCCCCCTG GCTCTGCTGG TGCTCCTGGC AAAGATGGAC TCAACGGTCT CCCTGGCCCC 3000 

20 

ATTGGGCCCC CTGGTCCTCG CGGTCGCACT GGTGATGCTG GTCCTGTTGG TCCCCCCGGC 3-060 

25 CCTCCTGGAC CTCCTGGTCC CCCTGGTCCT CCCAGCGCTG GTTTCGACTT CAGCTTCCTC 3120 

CCCCAGCCAC CTCAAGAGAA GGCTCACGAT GGTGGCCGCT ACTACCGGGC TAGATCCGAT 3180 

30 

GAGGCTTCTG GGATAGCCCC AGAAGTTCCT GATGACCGCG ACTTCGAGCC CTCCCTAGGC 3240 

CCAGTGTGCC CCTTCCGCTG TCAATGCCAT CTTCGAGTGG TCCAGTGTTC TGATTTGGGT 3300 

35 

CTGGACAAAG TGCCAAAGGA TCTTCCCCCT GACACAACTC TGCTAGACCT GCAAAACAAC 3360 

40 AAAATAACCG AAATCAAAGA TGGAGACTTT AAGAACCTGA AGAACCTTCA CGCATTGATT 3420 

CTTGTCAACA ATAAAATTAG CAAAGTTAGT CCTGGAGCAT TTACACCTTT GGTGAAGTTG 3480 

45 

GAACGACTTT ATCTGTCCAA GAATCAGCTG AAGGAATTGC CAGAAAAAAT GCCCAAAACT 3540 

50 CTTCAGGAGC TGCGTGCCCA TGAGAATGAG ATCACCAAAG TGCGAAAAGT TACTTTCAAT 3600 

GGACTGAACC AGATGATTGT CATAGAACTG GGCACCAATC CGCTGAAGAG CTCAGGAATT 3660 

55 
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GAAAATGGGG CTTTCCAGGG AATGAAGAAG CTCTCCTACA TCCGCATTGC TGATACCAAT 3720 

5 

ATCACCAGCA TTCCTCAAGG TCTTCCTCCT TCCCTTACGG AATTACATCT TGATGGCAAC 3780 
, 0 AAAATCAGCA GAGTTGATGC AGCTAGCCTG AAAGGACTGA ATAATTTGGC TAAGTTGGGA 3840 

TTGAGTTTCA ACAGCATCTC TGCTGTTGAC AATGGCTCTC TGGCCAACAC GCCTCATCTG 3900 
AGGGAGCTTC ACTTGGACAA CAACAAGCTT ACCAGAGTAC CTGGTGGGCT GGCAGAGCAT 3960 
AAGTACATCC AGGTTGTCTA CCTTCATAAC AACAATATCT CTGTAGTTGG ATCAAGTGAC 4020 

20 

TTCTGCCCAC CTGGACACAA CACCAAAAAG GCTTCTTATT CGGGTGTGAG TCTTTTCAGC 4080 
25 AACCCGGTCC AGTACTGGGA GATACAGCCA TCCACCTTCA GATGTGTCTA CGTGCGCTCT 4140 

GCCATTCAAC TCGGAAACTA TAAGTAA 4167 

30 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3349 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
40 (D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: CDNA 

45 



50 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GGGAAGGATT TCCATTTCCC AGCTGTCTTA TGGCTATGAT GAGAAATCAA CCGGAGGAAT 60 

55 
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TTCCGTGCCT GGCCCCATGG GTCCCTCTGG TCCTCGTGGT CTCCCTGGCC CCCCTGGTGC 120 

ACCTGGTCCC CAAGGCTTCC AAGGTCCCCC TGGTGAGCCT GGCGAGCCTG GAGCTTCAGG 180 

TCCCATGGGT CCCCGAGGTC CCCCAGGTCC CCCTGGAAAG AATGGAGATG ATGGGGAAGC 240 

TGGAAAACCT GGTCGTCCTG GTGAGCGTGG GCCTCCTGGG CCTCAGGGTG CTCGAGGATT 300 

GCCCGGAACA GCTGGCCTCC CTGGAATGAA GGGACACAGA GGTTTCAGTG GTTTGGATGG 360 

TGCCAAGGGA GATGCTGGTC CTGCTGGTCC TAAGGGTGAG CCTGGCAGCC CTGGTGAAAA 420 

TGGAGCTCCT GGTCAGATGG GCCCCCGTGG CCTGCCTGGT GAGAGAGGTC GCCCTGGAGC 480 

25 CCCTGGCCCT GCTGGTGCTC GTGGAAATGA TGGTGCTACT GGTGCTGCCG GGCCCCCTGG 540 

TCCCACCGGC CCCGCTGGTC CTCCTGGCTT CCCTGGTGCT GTTGGTGCTA AGGGTGAAGC 600 

TGGTCCCCAA GGGCCCCGAG GCTCTGAAGG TCCCCAGGGT GTGCGTGGTG AGCCTGGCCC 660 

CCCTGGCCCT GCTGGTGCTG CTGGCCCTGC TGGAAACCCT GGTGCTGATG GACAGCCTGG 720 

TGCTAAAGGT GCCAATGGTG CTCCTGGTAT TGCTGGTGCT CCTGGCTTCC CTGGTGCCCG 780 

40 AGGCCCCTCT GGACCCCAGG GCCCCGGCGG CCCTCCTGGT CCCAAGGGTA ACAGCGGTGA ■ 840 

ACCTGGTGCT CCTGGCAGCA AAGGAGACAC TGGTGCTAAG GGAGAGCCTG GCCCTGTTGG 900 

TGTTCAAGGA CCCCCTGGCC CTGCTGGAGA GGAAGGAAAG CGAGGAGCTC GAGGTGAACC 960 

CGGACCCACT GGCCTGCCCG GACCCCCTGG CGAGCGTGGT GGACCTGGTA GCCGTGGTTT 1020 

CCCTGGCGCA GATGGTGTTG CTGGTCCCAA GGGTCCCGCT GGTGAACGTG GTTCTCCTGG 1080 

55 
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CCCCGCTGGC CCCAAAGGAT CTCCTGGTGA AGCTGGTCGT CCCGGTGAAG CTGGTCTGCC 1140 

TGGTGCCAAG GGTCTGACTG GAAGCCCTGG CAGCCCTGGT CCTGATGGCA AAACTGGCCC 1200 

CCCTGGTCCC GCCGGTCAAG ATGGTCGCCC CGGACCCCCA GGCCCACCTG GTGCCCGTGG 1260 

TCAGGCTGGT GTGATGGGAT TCCCTGGACC TAAAGGTGCT GCTGGAGAGC CCGGCAAGGC 1320 

TGGAGAGCGA GGTGTTCCCG GACCCCCTGG CGCTGTCGGT CCTGCTGGCA AAGATGGAGA 1380 

GGCTGGAGCT CAGGGACCCC CTGGCCCTGC TGGTCCCGCT GGCGAGAGAG GTGAACAAGG 1440 

CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT CCCTGGTCCT GCTGGTCCTC CAGGTGAAGC 1500 

AGGCAAACCT GGTGAACAGG GTGTTCCTGG AGACCTTGGC GCCCCTGGCC CCTCTGGAGC 1560 

AAGAGGCGAG AGAGGTTTCC CTGGCGAGCG TGGTGTGCAA GGTCCCCCTG GTCCTGCTGG 1620 

30 ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CGATGGTGCT AAGGGTGATG CTGGTGCCCC 1680 

TGGAGCTCCC GGTAGCCAGG GCGCCCCTGG CCTTCAGGGA ATGCCTGGTG AACGTGGTGC 1740 

AGCTGGTCTT CCAGGGCCTA AGGGTGACAG AGGTGATGCT GGTCCCAAAG GTGCTGATGG 1800 

CTCTCCTGGC AAAGATGGCG TCCGTGGTCT GACCGGCCCC ATTGGTCCTC CTGGCCCTGC 1860 

TGGTGCCCCT GGTGACAAGG GTGAAAGTGG TCCCAGCGGC CCTGCTGGTC CCACTGGAGC 1920 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTG GCTTTGCTGG 1980 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC TAAAGGCGAA CCTGGTGATG CTGGTGCCAA 2040 

50 AGGCGATGCT GGTCCCCCTG GGCCTGCCGG ACCCGCTGGA CCCCCTGGCC CCATTGGTAA 2100 
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TTTCGACTTC AGCTTCCTCC CCCAGCCACC TCAAGAGAAG GCTCACGATG GTGGCCGCTA 3180 

CTACCGGGCT AGATCTCCAA AGGATCTTCC CCCTGACACA ACTCTGCTAG ACCTGCAAAA 3240 

CAACAAAATA ACCGAAATCA AAGATGGAGA CTTTAAGAAC CTGAAGAACC TTCACGCATT 3300 

GATTCTTGTC AACAATAAAA TTAGCAAAGT TAGTCCTGGA TAACTGCAG 3349 
J5 (2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 
ATCGAGGGAA GGATTTCAGA ATTCGGATCC TCTAGAGTCG ACCTGCAGGC AAGCTTG 57 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3171 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

CAGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC TGGCCCCATG SO 

GGTCCCTCTG GTCCTCGTGG TCTCCCTGGC CCCCCTGGTG CACCTGGTCC CCAAGGCTTC 120 

CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT GGAGCTTCAG GTCCCATGGG TCCCCGAGGT 180 

CCCCCAGGTC CCCCTGGAAA GAATGGAGAT GATGGGGAAG CTGGAAAACC TGGTCGTCCT 240 

GGTGAGCGTG GGCCTCCTGG GCCTCAGGGT GCTCGAGGAT TGCCCGGAAC AGCTGGCCTC 300 

CCTGGAATGA AGGGACACAG AGGTTTCAGT GGTTTGGATG GTGCCAAGGG AGATGCTGGT 360 

25 CCTGCTGGTC CTAAGGGTGA GCCTGGCAGC CCTGGTGAAA ATGGAGCTCC TGGTCAGATG 420 

GGCCCCCGTG GCCTGCCTGG TGAGAGAGGT CGCCCTGGAG CCCCTGGCCC TGCTGGTGCT 480 

CGTGGAAATG ATGGTGCTAC TGGTGCTGCC GGGCCCCCTG GTCCCACCGG CCCCGCTGGT 540 

CCTCCTGGCT TCCCTGGTGC TGTTGGTGCT AAGGGTGAAG CTGGTCCCCA AGGGCCCCGA 600 

GGCTCTGAAG GTCCCCAGGG TGTGCGTGGT GAGCCTGGCC CCCCTGGCCC TGCTGGTGCT 660 

40 GCTGGCCCTG CTGGAAACCC TGGTGCTGAT GGACAGCCTG GTGCTAAAGG TGCCAATGGT 720 

GCTCCTGGTA TTGCTGGTGC TCCTGGCTTC CCTGGTGCCC GAGGCCCCTC TGGACCCCAG 780. 

45 

GGCCCCGGCG GCCCTCCTGG TCCCAAGGGT AACAGCGGTG AACCTGGTGC TCCTGGCAGC 840 

AAAGGAGACA CTGGTGCTAA GGGAGAGCCT GGCCCTGTTG GTGTTCAAGG ACCCCCTGGC 900 

50 



55 



30 



35 



89 



10 



15 



EP 0 992 586 A2 

CCTGCTGGAG AGGAAGGAAA GCGAGGAGCT CGAGGTGAAC CCGGACCCAC TGGCCTGCCC 960 

GGACCCCCTG GCGAGCGTGG TGGACCTGGT AGCCGTGGTT TCCCTGGCGC AGATGGTGTT 1020 

GCTGGTCCCA AGGGTCCCGC TGGTGAACGT GGTTCTCCTG GCCCCGCTGG CCCCAAAGGA 1080 

TCTCCTGGTG AAGCTGGTCG TCCCGGTGAA GCTGGTCTGC CTGGTGCCAA GGGTCTGACT 1140 

GGAAGCCCTG GCAGCCCTGG TCCTGATGGC AAAACTGGCC CCCCTGGTCC CGCCGGTCAA 1200 

20 GATGGTCGCC CCGGACCCCC AGGCCCACCT GGTGCCCGTG GTCAGGCTGG TGTGATGGGA 1260 

TTCCCTGGAC CTAAAGGTGC TGCTGGAGAG CCCGGCAAGG CTGGAGAGCG AGGTGTTCCC 1320 

GGACCCCCTG GCGCTGTCGG TCCTGCTGGC AAAGATGGAG AGGCTGGAGC TCAGGGACCC 1380 

CCTGGCCCTG CTGGTCCCGC TGGCGAGAGA GGTGAACAAG GCCCTGCTGG CTCCCCCGGA 1440 

TTCCAGGGTC TCCCTGGTCC TGCTGGTCCT CCAGGTGAAG CAGGCAAACC TGGTGAACAG 1500 

35 GGTGTTCCTG GAGACCTTGG CGCCCCTGGC CCCTCTGGAG CAAGAGGCGA GAGAGGTTTC 1560 

CCTGGCGAGC GTGGTGTGCA AGGTCCCCCT GGTCCTGCTG GACCCCGAGG GGCCAACGGT 1620 
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GCTCCCGGCA ACGATGGTGC TAAGGGTGAT GCTGGTGCCC CTGGAGCTCC CGGTAGCCAG 1680 



GGCGCCCCTG GCCTTCAGGG AATGCCTGGT GAACGTGGTG CAGCTGGTCT TCCAGGGCCT 1740 



AAGGGTGACA GAGGTGATGC TGGTCCCAAA GGTGCTGATG GCTCTCCTGG CAAAGATGGC 1800 



GTCCGTGGTC TGACCGGCCC CATTGGTCCT CCTGGCCCTG CTGGTGCCCC TGGTGACAAG 1860 



GGTGAAAGTG GTCCCAGCGG CCCTGCTGGT CCCACTGGAG CTCGTGGTGC CCCCGGAGAC 1920 
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CGTGGTGAGC CTGGTCCCCC CGGCCCTGCT GGCTTTGCTG GCCCCCCTGG TGCTGACGGC 1980 

CAACCTGGTG CTAAAGGCGA ACCTGGTGAT GCTGGTGCCA AAGGCGATGC TGGTCCCCCT 2040 

GGGCCTGCCG GACCCGCTGG ACCCCCTGGC CCCATTGGTA ATGTTGGTGC TCCTGGAGCC 2100 

AAAGGTGCTC GCGGCAGCGC TGGTCCCCCT GGTGCTACTG GTTTCCCTGG TGCTGCTGGC 21G0 

CGAGTCGGTC CTCCTGGCCC CTCTGGAAAT GCTGGACCCC CTGGCCCTCC TGGTCCTGCT 2220 

GGCAAAGAAG GCGGCAAAGG TCCCCGTGGT GAGACTGGCC CTGCTGGACG TCCTGGTGAA 2280 

GTTGGTCCCC CTGGTCCCCC TGGCCCTGCT GGCGAGAAAG GATCCCCTGG TGCTGATGGT 2340 

CCTGCTGGTG CTCCTGGTAC TCCCGGGCCT CAAGGTATTG CTGGACAGCG TGGTGTGGTC 240O 

GGCCTGCCTG GTCAGAGAGG AGAGAGAGGC TTCCCTGGTC TTCCTGGCCC CTCTGGTGAA 2460 

CCTGGCAAAC AAGGTCCCTC TGGAGCAAGT GGTGAACGTG GTCCCCCCGG TCCCATGGGC 2520 

CCCCCTGGAT TGGCTGGACC CCCTGGTGAA TCTGGACGTG AGGGGGCTCC TGCTGCCGAA 2SS0 

GGTTCCCCTG GACGAGACGG TTCTCCTGGC GCCAAGGGTG ACCGTGGTGA GACCGGCCCC 2640 

GCTGGACCCC CTGGTGCTCC TGGTGCTCCT GGTGCCCCTG GCCCCGTTGG CCCTGCTGGC 2700 

AAGAGTGGTG ATCGTGGTGA GACTGGTCCT GCTGGTCCCG CCGGTCCCGT CGGCCCCGCT 2760 

GGCGCCCGTG GCCCCGCCGG ACCCCAAGGC CCCCGTGGTG ACAAGGGTGA GACAGGCGAA 2820 

CAGGGCGACA GAGGCATAAA GGGTCACCGT GGCTTCTCTG GCCTCCAGGG TCCCCCTGGC 2880 

CCTCCTGGCT CTCCTGGTGA ACAAGGTCCC TCTGGAGCCT CTGGTCCTGC TGGTCCCCGA 2940 
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GGTCCCCCTG GCTCTGCTGG TGCTCCTGGC AAAGATGGAC TCAACGGTCT CCCTGGCCCC 3000 

ATTGGGCCCC CTGGTCCTCG CGGTCGCACT GGTGATGCTG GTCCTGTTGG TCCCCCCGGC 3060 

CCTCCTGGAC CTCCTGGTCC CCCTGGTCCT CCCAGCGCTG GTTTCGACTT CAGCTTCCTC 3120 

CCCCAGCCAC CTCAAGAGAA GGCTCACGAT GGTGGCCGCT ACTACCGGGC T 3171 
(2) INFORMATION FOR SEQ ID NO: 16: 



(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 1057 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE : peptide 



35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
1 5 10 15 

40 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 



Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 
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Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 

Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala val Gly Ala Lys Gly 
180 185 190 

Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 

Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 
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Ala Pro Gly lie Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 

Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 

Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 

Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 

Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 

Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 335 

Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 

Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 

Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 3B0 

Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
385 390 395 400 



Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 



94 



EP 0 992 586 A2 



Gly Val Met Gly Phe Pro Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly 
420 425 430 

Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Gly Ala Val Gly Pro 
435 440 445 

Ala Gly Lys Asp Gly Glu Ala Gly. Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 

Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 

Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 

Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 

Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 
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Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 

Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr . Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

Pro Gly Pro lie Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 720 

Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 

Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 

Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 
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Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 

Pro Gly Thr Pro Gly Pro Gin Gly lie Ala Gly Gin Arg Gly Val Val 
785 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 

Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 825 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Ala Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 
885 890 895 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 

Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 
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Gly He Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 

Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 

Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
980 985 990 

Gly Leu Asn Gly Leu Pro Gly Pro He Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 

Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 

Ala 

INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 46 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
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(ix) FEATURE: 

(A) NAME/KEY: Region 

(B) LOCATION: 1. .2 

(D) OTHER INFORMATION: /note= "Amino acid sequence for 
glutathione S-transf erase" 



(ix) FEATURE: 
f5 (A) NAME/KEY: Region 

(B) LOCATION: 19.. 20 

(D) OTHER INFORMATION: /note= "338 repeats of the 
following triplet Gly-X-y wherein about 35% of the X and Y 
positions are occupied by proline and 4-hydroxyproline. " 



20 



25 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



Xaa Met Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Tbx Gly Gly lie 
30 x 5 10 15 

Ser Val Pro Xaa Ser Ala Gly Phe Asp Phe Ser Phe Leu Pro Gin Pro 
35 20 25 30 

Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg Ala 
35 40 45 

40 

(2) INFORMATION FOR SEQ ID NO: 18: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 amino acids 

(B) TYPE: amino acid 

50 (C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 



55 



(ii) MOLECULE TYPE: peptide 
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( ix) FEATURE : 

(A) NAME/KEY: Region 

(B) LOCATION: 1..2 

(D) OTHER INFORMATION: /note= "Amino acid sequence for 
glutathione S-transf erase. " 

(ix) FEATURE: 

(A) NAME/KEY: Region 

(B) LOCATION: 4. .5 

(D) OTHER INFORMATION: /note* "338 repeats of the 
following triplet Gly-X-Y wherein about 35% of the X and Y 
positions are occupied by proline and 4-hydroxyproline. " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Xaa Met Gly Xaa Tyr Ser Ala Gly Phe Asp Phe Ser Phe Leu Pro Gin 
1 5 10 15 

Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg Ala 
20 25 30 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3171 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

5 

CAGCTGAGCT ATGGCTATGA TGAAAAAAGC ACCGGCGGCA TCAGCGTGCC GGGCCCGATG 60 

10 GGTCCGAGCG GCCCTCGTGG CCTGCCGGGC CCGCCAGGTG CGCCCGGTCC GCAGGGCTTT 120 

CAGGGTCCGC CGGGCGAACC GGGCGAACCT GGTCCGAGCG GCCCGATGGG CCCGCGCGGC 180 

5 CCGCCGGGTC CGCCAGGCAA AAACGGCGAT GATGGCGAAG CGGGCAAACC GGGACGTCCG 240 

GGTGAACGTG GCCCCCCGGG CCCGCAGGGC GCGCGCGGAC TGCCGGGTAC TGCGGGACTG 300 

20 

CCGGGCATGA AAGGCCACCG CGGTTTCTCT GGTCTGGATG GTGCGAAAGG TGATGCGGGT 3G0 

CCGGCGGGTC CGAAAGGTGA GCCGGGCAGC CCGGGCGAAA ACGGCGCGCC GGGTCAGATG 420 

25 

GGCCCGCGTG GCCTGCCTGG TGAACGCGGT CGCCCGGGCG CCCCGGGCCC AGCTGGCGCA 480 

30 CGTGGCAACG ATGGTGCGAC CGGTGCGGCC GGTCCACCGG GCCCGACGGG CCCGGCGGGT 540 

CCCCCGGGCT TTCCGGGTGC GGTGGGTGCG AAAGGCGAAG CAGGTCCGCA GGGGCCGCGC 600 

35 

GGGAGCGAGG GTCCTCAGGG CGTTCGTGGT GAACCGGGCC CGCCGGGCCC GGCGGGTGCG 660 

GCGGGCCCGG CTGGTAACCC TGGCGCGGAC GGTCAGCCAG GTGCGAAAGG TGCCAACGGC 720 

40 

GCGCCGGGTA TTGCAGGTGC ACCGGGCTTC CCGGGTGCCC GCGGCCCGTC CGGCCCGCAG 780 

45 GGCCCGGGCG GCCCGCCCGG CCCGAAAGGG AACAGCGGTG AACCGGGTGC GCCAGGCAGC 840 

AAAGGCGACA CCGGTGCGAA AGGTGAACCG GGCCCAGTGG GTGTTCAAGG CCCGCCGGGC 900 

50 

CCGGCGGGCG AGGAAGGCAA ACGCGGTGCT CGCGGTGAAC CGGGCCCGAC CGGCCTGCCT 960 

55 
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GGCCCGCCGG GAGAACGTGG TGGCCCGGGT AGCCGCGGTT TTCCGGGCGC GGATGGTGTG 1020 

5 

GCGGGCCCGA AAGGTCCGGC GGGTGAACGT GGTAGCCCGG GCCCGGCGGG CCCAAAAGGC 1080 
10 AGCCCGGGCG AGGCAGGACG TCCGGGTGAA GCGGGTCTCC CGGGCGCCAA AGGTCTGACC 1140 

GGCTCTCCGG GCAGCCCGGG TCCGGATGGC AAAACGGGCC CGCCTGGTCC GGCCGGCCAG 1200 

*5 

GATGGTCGCC CGGGCCCGCC GGGCCCGCCG GGTGCCCGTG GTCAGGCGGG TGTCATGGGC 1260 
TTTCCAGGCC CCAAAGGTGC GGCGGGTGAA CCGGGCAAAG CGGGCGAACG CGGTGTCCCG 1320 

20 

GGTCCGCCGG GCGCTGTCGG GCCGGCGGGC AAAGATGGCG AAGCGGGCGC GCAAGGCCCG 1380 
25 CCGGGACCAG CGGGTCCGGC GGGCGAGCGC GGTGAACAGG GCCCGGCAGG CAGCCCGGGT 1440 

TTCCAGGGTC TGCCGGGCCC TGCGGGTCCA CCGGGTGAAG CGGGCAAACC GGGGGAACAA 1500 

30 

GGTGTGCCGG GCGACCTGGG CGCCCCAGGC CCGAGCGGCG CGCGCGGCGA ACGCGGTTTC 1560 
CCGGGCGAAC GTGGTGTGCA GGGCCCGCCC GGCCCGGCTG GTCCGCGCGG CGCCAACGGC 1620 

35 

GCGCCGGGCA ACGATGGTGC GAAAGGTGAT GCGGGTGCCC CAGGTGCGCC GGGCAGCCAG 1680 
40 GGCGCCCCGG GGCTGCAAGG CATGCCGGGT GAACGTGGTG CCGCGGGTCT ACCGGGTCCG 1740 

AAAGGCGACC GCGGTGATGC GGGTCCAAAA GGTGCGGATG GCTCCCCTGG CAAAGATGGC 1800 

45 

GTTCGTGGTC TGACCGGCCC GATCGGCCCG CCGGGCCCGG CAGGTGCCCC GGGTGACAAA 1860 
GGTGAAAGCG GTCCGAGCGG CCCAGCGGGC CCCACTGGTG CGCGTGGTGC CCCGGGCGAC 1920 

50 

CGTGGTGAAC CGGGTCCGCC GGGCCCGGCG GGCTTTGCGG GCCCGCCAGG CGCTGACGGC 1980 

55 
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CAGCCGGGTG CGAAAGGCGA ACCGGGGGAT GCGGGTGCTA AAGGCGACGC GGGTCCGCCG 2040 

GGCCCTGCCG GCCCGGCGGG CCCGCCAGGC CCGATTGGCA ACGTGGGTGC GCCGGGTGCC 2100 

10 AAAGGTGCGC GCGGCAGCGC TGGTCCGCCG GGCGCGACCG GTTTCCCCGG TGCGGCGGGG 2160 

CGCGTGGGTC CGCCAGGCCC GAGCGGTAAC GCGGGTCCGC CAGGTCCGCC TGGCCCGGCT 2220 

GGCAAAGAGG GCGGCAAAGG TCCGCGTGGT GAAACCGGCC CTGCGGGACG TCCAGGTGAA 2280 

GTGGGTCCGC CGGGCCCGCC GGGCCCGGCG GGCGAAAAAG GTAGCCCGGG TGCGGATGGT 2340 

CCCGCCGGTG CGCCAGGCAC GCCGGGTCCG CAAGGTATCG CTGGCCAGCG TGGTGTCGTC 2400 

GGGCTGCCGG GTCAGCGCGG CGAACGCGGC TTTCCGGGTC TGCCGGGCCC GAGCGGTGAG 2460 

CCGGGCAAAC AGGGTCCATC TGGCGCGAGC GGTGAACGTG GCCCGCCGGG TCCCATGGGC 2520 

CCGCCGGGTC TGGCGGGCCC TCCGGGTGAA AGCGGTCGTG AAGGCGCGCC GGGTGCCGAA 2580 

« 

GGCAGCCCAG GCCGCGACGG TAGCCCGGGG GCCAAAGGGG ATCGTGGTGA AACCGGCCCG 2640 

GCGGGCCCCC CGGGTGCACC GGGCGCGCCG GGTGCCCCAG GCCCGGTGGG CCCGGCGGGC 2700 

AAAAGCGGTG ATCGTGGTGA GACCGGTCCG GCGGGCCCGG CCGGTCCGGT GGGCCCAGCG 2760 

GGCGCCCGTG GCCCGGCCGG TCCGCAGGGC CCGCGGGGTG ACAAAGGTGA AACGGGCGAA 2820 

CAGGGCGACC GTGGCATTAA AGGCCACCGT GGCTTCAGCG GCCTGCAGGG TCCACCGGGC 2880 

50 CCGCCGGGCA GTCCGGGTGA ACAGGGTCCG TCCGGAGCCA GCGGGCCGGC GGGCCCACGC 2 940 

GGTCCGCCGG GCAGCGCGGG CGCGCCGGGC AAAGACGGTC TGAACGGTCT GCCGGGCCCG 3000 

55 
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ATCGGCCCGC CGGGCCCACG CGGCCGCACC GGTGATGCGG GTCCGGTGGG TCCCCCGGGC 3060 

CCGCCGGGCC CGCCAGGCCC GCCGGGACCG CCGAGCGCGG GTTTCGACTT CAGCTTCCTG 3120 

CCGCAGCCGC CGCAGGAGAA AGCGCACGAC GGCGGTCGCT ACTACCGTGC G 3171 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1057 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
1 5 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 

Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 
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Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 

Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 

Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 

Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 
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Ala Pro Gly lie Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 



10 



Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 



15 



20 



25 



30 



35 



Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 

Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 

Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 

Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 ■ 335 

Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 

Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 



40 



Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 380 



45 



Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
385 390 395 400 



50 



Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 
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Gly Val Met Gly Phe Pro Gly Pro 
420 

Lys Ala Gly Glu Arg Gly Val Pro 
435 440 



Lys Gly Ala Ala Gly Glu Pro Gly 
425 430 

Gly Pro Pro Gly Ala Val Gly Pro 
445 



Ala Gly Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 

Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 



Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 

Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 

Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 
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Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 

Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

' Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

Pro Gly Pro lie Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 7120 

Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 

Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 

Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 
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Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 

Pro Gly Thr Pro Gly Pro Gin Gly He Ala Gly Gin Arg Gly Val Val 
785 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe. Pro Gly Leu Pro Gly 
805 810 815 

Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 825 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 
885 890 B95 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 



Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 
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Gly He Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 

Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 



Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
15 980 985 990 

Gly Leu Asn Gly Leu Pro Gly Pro He Gly Pro Pro Gly Pro Arg Gly 
20 995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly t>ro Pro Ser Ala Gly PHe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 



Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 



Ala 



(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 79 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GGAATTCATG CAGCTGAGCT ATGGCTATGA TGAAAAAAGC ACCGGCGGCA TCAGCGTGCC 
GGGCCCGATG GGTCCGAGC 
(2) INFORMATION FOR SEQ ID NO: 22: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 75 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GGCCCGGGCT ACCCAGGCTC GCCGGGCGCA CCGGACGGCC CGGGCGGTCC AGCGGGGCCA 
GCATTATTCG AACCC 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 81 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GGAATTCCGG GTCCGCAGGG CTTTCAGGGT CCGCCGGGCG AACCTGGTGC GAGCGGCCCG 
ATGGGCCCGC GCGGCCCGCC C 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 87 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
TACCCGGGCG CGCCGGGCGG CCCAGGCGGT CCGTTTTTGC CGCTACTACC GTTCGCCCGT 
TTGGCCCTGC AGGCATTATT CGAACCC 
(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 111 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
CAGCTGAGCT ATGGCTATGA TGAAAAAAGC ACCGGCGGCA TCAGCGTGCC GGGCCCGATG 
GGTCCGAGCG GCCCTCGTGG CCTGCCGGGC CCGCCAGGTG CGCCCGGTCC G 
(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
15 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 

Gly Ala Pro Gly Pro 
35 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

CAGCTGAGCT ATGGCTATGA TGAAAAAAGC ACCGGCGGCA TCAGCGTGCC GGGCCCGATG 60 

GGTCCGAGCG GCCCTCGTGG CCTGCCGGGC CCGCCAGGTG CGCCCGGTCC GCAGGGCTTT 120 

CAGGGTCCGC CGGGCGAACC GGGCGAACCT GGTGCGAGCG GCCCGATGGG CCCGCGCGGC 180 

25 CCGCCGGGTC CGCCAGGCAA AAACGGCGAT GATGGCGAAG CGGGCAAACC GGGACGTCCG 240 



10 



15 



20 



30 
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40 
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55 



(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:28: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
1 5 10 IS 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 
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Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 

5 

35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
10 50 55 60 

Pro Gly Lye Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
15 65 70 75 80 



20 



25 



30 



35 



(2) INFORMATION FOR SEQ ID NO: 29: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3120 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CAGTATGATG GAAAAGGAGT TGGACTTGGC CCTGGACCAA TGGGCTTAAT GGGACCTAGA 60 

40 

GGCCCACCTG GTGCAGCTGG AGCCCCAGGC CCTCAAGGTT TCCAAGGACC TGCTGGTGAG 120 
CCTGGTGAAC CTGGTCAAAC TGGTCCTGCA GGTGCTCGTG GTCCAGCTGG CCCTCCTGGC 180 

45 

AAGGCTGGTG AAGATGGTCA CCCTGGAAAA CCCGGACGAC CTGGTGAGAG AGGAGTTGTT 240 
50 GGACCACAGG GTGCTCGTGG TTTCCCTGGA ACTCCTGGAC TTCCTGGCTT CAAAGGCATT 300 

AGGGGACACA ATGGTCTGGA TGGATTGAAG GGACAGCCCG GTGCTCCTGG TGTGAAGGGT 360 

55 
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GAACCTGGTG CCCCTGGTGA AAATGGAACT CCAGGTCAAA CAGGAGCCCG TGGGCTTCCT 420 

5 

GGTGAGAGAG GACGTGTTGG TGCCCCTGGC CCAGCTGGTG CCCGTGGCAG TGATGGAAGT 480 

10 GTGGGTCCCG TGGGTCCTGC TGGTCCCATT GGGTCTGCTG GCCCTCCAGG CTTCCCAGGT 540 

GCCCCTGGCC CCAAGGGTGA AATTGGAGCT GTTGGTAACG CTGGTCCTGC TGGTCCCGCC 600 

15 

GGTCCCCGTG GTGAAGTGGG TCTTCCAGGC CTCTCCGGCC CCGTTGGACC TCCTGGTAAT 660 

CCTGGAGCAA ACGGCCTTAC TGGTGCCAAG GGTGCTGCTG GCCTTCCCGG CGTTGCTGGG 720 

20 

GCTCCCGGCC TCCCTGGACC CCGCGGTATT CCTGGCCCTG TTGGTGCTGC CGGTGCTACT 780 

25 GGTGCCAGAG GACTTGTTGG TGAGCCTGGT CCAGCTGGCT CCAAAGGAGA GAGCGGTAAC 840 

AAGGGTGAGC CCGGCTCTGC TGGGCCCCAA GGTCCTCCTG GTCCCAGTGG TGAAGAAGGA 900 

30 

AAGAGAGGCC CTAATGGGGA AGCTGGATCT GCCGGCCCTC CAGGACCTCC TGGGCTGAGA 960 

35 GGTAGTCCTG GTTCTCGTGG TCTTCCTGGA GCTGATGGCA GAGCTGGCGT CATGGGCCCT X020 

CCTGGTAGTC GTGGTGCAAG TGGCCCTGCT GGAGTCCGAG GACCTAATGG AGATGCTGGT 1080 

40 

CGCCCTGGGG AGCCTGGTCT CATGGGACCC AGAGGTCTTC CTGGTTCCCC TGGAAATATC 1140 

GGCCCCGCTG GAAAAGAAGG TCCTGTCGGC CTCCCTGGCA TCGACGGCAG GCCTGGCCCA 1200 

45 

ATTGGCCCAG CTGGAGCAAG AGGAGAGCCT GGCAACATTG GATTCCCTGG ACCCAAAGGC 1260 

50 CCCACTGGTG ATCCTGGCAA AAACGGTGAT AAAGGTCATG CTGGTCTTGC TGGTGCTCGG 1320 

GGTGCTCCAG GTCCTGATGG AAACAATGGT GCTCAGGGAC CTCCTGGACC ACAGGGTGTT 1380 

55 
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CAAGGTGGAA AAGGTGAACA 
CCCTCAGGTC CCGCTGGTGA 
GGTCTCCCTG GTCCTGCTGG 
GCCGGTCCTA CTGGTCCTAT 
AACAAGGGTG AACCTGGTGT 
GGACTCCCAG GAGAGAGGGG 
CCTGGTCTCA GAGGTGAAAT 
GCTGTAGGTG CCCCTGGTCC 
GGTCCTGCTG GTCCTGCTGG 
GCTGGCCCCA ACGGATTTGC 
GAAAGAGGAG CCAAAGGGCC 
GGAGCTGCTG GCCCAGCTGG 
GGAGGCCCCC CTGGTATGAC 
CCCTCTGGTA TTTCTGGCCC 
GGTCCTCGTG GTGACCAAGG 
CCTGGCTTCG CTGGTGAGAA 
ACTCCAGGTC CTCAGGGTCT 



GGGTCCCGCT GGTCCTCCAG 
AGTTGGCAAA CCAGGAGAAA 
TCCAAGAGGG GAACGCGGTC 
TGGAAGCCGA GGTCCTTCTG 
GGTTGGTGCT GTGGGCACTG 
TGCTGCTGGC ATACCTGGAG 
TGGTAACCCT GGCAGAGATG 
TGCTGGAGCC ACAGGTGACC 
TCCTCGGGGA AGCCCTGGTG 
TGGTCCGGCT GGTGCTGCTG 
TAAGGGTGAA AACGGTGTTG 
TCCAAATGGT CCCCCCGGTC 
TGGTTTCCCT GGTGCTGCTG 
TCCTGGTCCC CCTGGTCCTG 
TCCAGTTGGC CGAACTGGAG 
GGGTCCCTCT GGAGAGGCTG 
TCTTGGTGCT CCTGGTATTC 



GCTTCCAGGG TCTGCCTGGC 
GGGGTCTCCA TGGTGAGTTT 
CCCCAGGTGA GAGTGGTGCT 
GACCCCCAGG GCCTGATGGA 
CTGGTCCATC TGGTCCTAGT 
GCAAGGGAGA AAAGGGTGAA 
GTGCTCGTGG TGCTCATGGT 
GGGGCGAAGC TGGGGCTGCT 
AACGTGGCGA GGTCGGTCCT 
GTCAACCGGG TGCTAAAGGA 
TTGGTCCCAC AGGCCCCGTT 
CTGCTGGAAG TCGTGGTGAT 
GACGGACTGG TCCCCCAGGA 
CTGGGAAAGA AGGGCTTCGT 
AAGTAGGTGC AGTTGGTCCC 
GTACTGCTGG ACCTCCTGGC 
TGGGTCTCCC TGGCTCGAGA 
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5 



10 



15 



20 



GGTGAACGTG GTCTACCTGG TGTTGCTGGT GCTGTGGGTG AACCTGGTCC TCTTGGCATT 2460 

GCCGGCCCTC CTGGGGCCCG TGGTCCTCCT GGTGCTGTGG GTAGTCCTGG AGTCAACGGT 2520 

GCTCCTGGTG AAGCTGGTCG TGATGGCAAC CCTGGGAACG ATGGTCCCCC AGGTCGCGAT 2580 

GGTCAAGCCG GACACAAGGG AGAGCGCGGT TACCCTGGCA ATATTGGTCC CGTTGGTGCT 2640 

GCAGGTGCAC CTGGTCCTCA TGGCCCCGTG GGTCCTGCTG GCAAACATGG AAACCGTGGT 2700 

GAAACTGGTC CTTCTGGTCC TGTTGGTCCT GCTGGTGCTG TTGGCCCAAG AGGTCCTAGT 2760 

GGCCCACAAG GCATTCGTGG CGATAAGGGA GAGCCCGGTG AAAAGGGGCC CAGAGGTCTT 2820 

25 CCTGGCTTAA AGGGACACAA TGGATTGCAA GGTCTGCCTG GTATCGCTGG TCACCATGGT 2880 

GATCAAGGTG CTCCTGGCTC CGTGGGTCCT GCTGGTCCTA GGGGCCCTGC TGGTCCTTCT 2940 

GGCCCTGCTG GAAAAGATGG TCGCACTGGA CATCCTGGTA CGGTTGGACC TGCTGGCATT 3000 

CGAGGCCCTC AGGGTCACCA AGGCCCTGCT GGCCCCCCTG GTCCCCCTGG CCCTCCTGGA 3060 

CCTCCAGGTG TAAGCGGTGG TGGTTATGAC TTTGGTTACG ATGGAGACTT CTACAGGGCT 3120 



30 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1040 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Gin Tyr Asp Gly Lys. Gly val Gly Leu Gly Pro Gly Pro Met Gly Leu 
1 5 10 15 

Met Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin 
20 25 30 

Gly Phe Gin Gly Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly 
35 40 45 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Pro Gly Lys Ala Gly Glu 
50 55 60 

Asp Gly His Pro Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Val Val 
65 70 75 80 

Gly Pro Gin Gly Ala Arg Gly Phe Pro Gly Thr Pro Gly Leu Pro Gly 
85 90 95 

Phe Lys Gly lie Arg Gly His Asn Gly Leu Asp Gly Leu Lys Gly Gin 
100 105 110 

Pro Gly Ala Pro Gly Val Lys Gly Glu Pro Gly Ala Pro Gly Glu Asn 
115 120 125 

Gly Thr Pro Gly Gin Thr Gly Ala Arg Gly Leu Pro Gly Glu Arg Gly 
130 135 140 

Arg Val Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly Ser Asp Gly Ser 
145 150 155 160 



< 
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Val Gly Pro Val Gly Pro Ala Gly Pro lie Gly Ser Ala Gly Pro Pro 
165 170 175 

Gly Phe Pro Gly Ala Pro Gly Pro Lys Gly Glu He Gly Ala Val Gly 
1B0 185 190 

Asn Ala Gly Pro Ala Gly Pro Ala Gly Pro Arg Gly Glu Val Gly Leu 
195 200 205 

Pro Gly Leu Ser Gly Pro Val Gly Pro Pro Gly Asn Pro Gly Ala Asn 
210 215 220 

Gly Leu Thr Gly Ala Lyo Gly Ala Ala Gly Leu Pro Gly Val Ala Gly 
225 230 235 240 

Ala Pro Gly Leu Pro Gly Pro Arg Gly He Pro Gly Pro Val Gly Ala 
245 250 255 

Ala Gly Ala Thr Gly Ala Arg Gly Leu Val Gly Glu Pro Gly Pro Ala 
260 265 270 

Gly Ser Lys Gly Glu Ser Gly Asn Lys Gly Glu Pro Gly Ser Ala Gly 
275 280 285 

Pro Gin Gly Pro Pro Gly Pro Ser Gly Glu Glu Gly Lys Arg Gly Pro 
290 295 300 

Asn Gly Glu Ala Gly Ser Ala Gly Pro Pro Gly Pro Pro Gly Leu Arg 
305 310 315 320 



Gly Ser Pro Gly Ser Arg Gly Leu Pro Gly Ala Asp Gly Arg Ala Gly 
325 330 33S 



120 



EP 0 992 586 A2 



Val Met Gly Pro Pro Gly Ser Arg Gly Ala Ser Gly Pro Ala Gly Val 
340 345 350 

Arg Gly Pro Asn Gly Asp Ala Gly Arg Pro Gly Glu Pro Gly Leu Met 
355 360 365 

Gly Pro Arg Gly Leu Pro Gly Ser Pro Gly Asn lie Gly Pro Ala Gly 
370 375 380 

Lys Glu Gly Pro Val Gly Leu Pro Gly He Asp Gly Arg Pro Gly Pro 
385 390 395 400 

He Gly Pro Ala Gly Ala Arg Gly Glu Pro Gly Asn He Gly Phe Pro 
405 410 415 

Gly Pro Lys Gly Pro Thr Gly Asp Pro Gly Lys Asn Gly Asp Lys Gly 
420 425 430 

His Ala Gly Leu Ala Gly Ala Arg Gly Ala Pro Gly Pro Asp Gly Asn 
435 440 445 

Asn Gly Ala Gin Gly Pro Pro Gly Pro Gin Gly Val Gin Gly Gly Lys 
450 455 460 

Gly Glu Gin Gly Pro Ala Gly Pro Pro Gly Phe Gin Gly Leu Pro Gly 
465 470 475 480 

Pro Ser Gly Pro Ala Gly Glu Val Gly Lys Pro Gly Glu Arg Gly Leu 
465 490 495 

His Gly Glu Phe Gly Leu Pro Gly Pro Ala Gly Pro Arg Gly Glu Arg 
500 505 510 
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Gly Pro Pro Gly Glu Ser Gly Ala Ala Gly Pro Thr Gly Pro lie Gly 
515 520 525 

Ser Arg Gly Pro Ser Gly Pro Pro Gly Pro Asp Gly Asn Lys Gly Glu 
530 535 540 

Pro Gly Val Val Gly Ala Val Gly Thr Ala Gly Pro Ser Gly Pro Ser 
545 550 555 560 

Gly Leu Pro Gly Glu Arg Gly Ala Ala Gly lie Pro Gly Gly Lys Gly 
565 570 575 

Glu Lys Gly Glu Pro Gly Leu Arg Gly Glu He Gly Asn Pro Gly Arg 
580 585 590 

Asp Gly Ala Arg Gly Ala His Gly Ala Val Gly Ala Pro Gly Pro Ala 
595 600 605 

Gly Ala Thr Gly Asp Arg Gly Glu Ala Gly Ala Ala Gly Pro Ala Gly 
610 615 620 

Pro Ala Gly Pro Arg Gly Ser Pro Gly Glu Arg Gly Glu Val Gly Pro 
625 630 635 640 

Ala Gly Pro Asn Gly Phe Ala Gly Pro Ala Gly Ala Ala Gly Gin Pro 
645 650 655 

Gly Ala Lys Gly Glu Arg Gly Ala Lys Gly Pro Lys Gly Giu Asn Gly 
660 665 670 

Val Val Gly Pro Thr Gly Pro Val Gly Ala Ala Gly Pro Ala Gly Pro 
675 680 685 
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Asn Gly Pro Pro Gly Pro Ala Gly Ser Arg Gly Asp Gly Gly Pro Pro 
690 695 700 

Gly Met Thr Gly Phe Pro Gly Ala Ala Gly Arg Thr Gly Pro Pro Gly 
705 710 715 720 

Pro Ser Gly lie Ser Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Lys 
725 730 735 

Glu Gly Leu Arg Gly Pro Arg Gly Asp Gin Gly Pro Val Gly Arg Tbr 
740 745 750 

Gly Glu Val Gly Ala Val Gly Pro Pro Gly Phe Ala Gly Glu Lys Gly 
755 760 765 

Pro Ser Gly Glu Ala Gly Thr Ala Gly Pro Pro Gly Thr Pro Gly Pro 
770 775 780 

Gin Gly Leu Leu Gly Ala Pro Gly He Leu Gly Leu Pro Gly Ser Arg 
785 790 795 800 

Gly Glu Arg Gly Leu Pro Gly Val Ala Gly Ala Val Gly Glu Pro Gly 
805 810 815 

Pro Leu Gly He Ala Gly Pro Pro Gly Ala Arg Gly Pro Pro Gly Ala 
820 825 830 

Val Gly Ser Pro Gly Val Asn Gly Ala Pro Gly Glu Ala Gly Arg Asp 
835 840 845 

Gly Asn Pro Gly Asn Asp Gly Pro Pro Gly Arg Asp Gly Gin Pro Gly 
850 855 860 
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His Lys Gly Glu Arg Gly Tyr Pro Gly Asn He Gly Pro Val Gly Ala 
865 870 875 880 

Ala Gly Ala Pro Gly Pro His Gly Pro Val Gly Pro Ala Gly Lys His 
885 890 895 

Gly Asn Arg Gly Glu Thr Gly Pro Ser Gly Pro Val Gly Pro Ala Gly 
900 905 910 

Ala Val Gly Pro Arg Gly Pro Ser Gly Pro Gin Gly He Arg Gly Asp 
915 920 925 

Lys Gly Glu Pro Gly Glu Lys Gly Pro Arg Gly Leu Pro Gly Leu Lys 
930 935 940 

Gly His Asn Gly Leu Gin Gly Leu Pro Gly lie Ala Gly His His Gly 
945 950 955 960 

Asp Gin Gly Ala Pro Gly Ser Val Gly Pro Ala Gly Pro Arg Gly Pro 
965 970 975 

Ala Gly Pro Ser Gly Pro Ala Gly Lys Asp Gly Arg Thr Gly His Pro 
980 985 990 

Gly Thr Val Gly Pro Ala Gly He Arg Gly Pro Gin Gly His Gin Gly 
995 1000 1005 

Pro Ala Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Val 
1010 1015 1020 

Ser Gly Gly Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe Tyr Arg Ala 
1025 1030 1035. 1040 
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(2) INFORMATION FOR SEQ ID NO: 31: 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3120 base pairs 
1(J (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

15 

(ii) MOLECULE TYPE: CDNA 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

CAGTACGACG GTAAAGGCGT AGGCCTGGGT CCGGGTCCGA TGGGCCTGAT GGGTCCACGT 60 

25 

GGCCCACCGG GTGCAGCAGG TGCGCCGGGT CCGCAGGGCT TCCAAGGTCC GGCGGGTGAA 120 

CCGGGCGAAC CGGGTCAGAC GGGTCCGGCG GGTGCTCGCG GTCCGGCTGG CCCACCGGGC 180 

30 

AAAGCTGGCG AAGACGGTCA CCCGGGTAAG CCAGGCCGCC CGGGCGAACG TGGCGTCGTG 240 

35 GGTCCGCAAG GTGCGCGTGG TTTCCCGGGC ACGCCGGGTC TGCCGGGTTT CAAAGGCATT 300 

CGTGGTCACA ACGGTCTGGA CGGTCTGAAA GGCCAACCGG GTGCTCCGGG CGTCAAAGGC 360 

40 

GAACCGGGTG CCCCAGGCGA AAACGGTACG CCGGGCCAGA CTGGTGCGCG TGGTCTGCCG 420 

45 GGTGAACGCG GCCGTGTTGG CGCTCCGGGT CCGGCTGGCG CGCGTGGCAG CGATGGCTCC 480 

GTCGGTCCGG TTGGCCCTGC GGGTCCGATT GGTTCCGCTG GCCCTCCGGG TTTCCCGGGT 540 

50 ' 

GCGCCGGGTC CGAAGGGTGA GATCGGCGCG GTTGGCAACG CAGGCCCGGC TGGTCCAGCC 600 

GGCCCTCGTG GCGAAGTCGG TCTGCCGGGT CTGAGCGGTC CGGTAGGCCC ACCGGGTAAC 660 

55 
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CCGGGCGCAA ACGGCCTGAC GGGTGCAAAA GGTGCGGCTG GCCTGCCGGG CGTTGCCGGT 720 

GCCCCGGGCC TGCCGGGTCC GCGCGGTATT CCGGGTCCGG TAGGCGCAGC CGGTGCAACT 780 

GGTGCCCGTG GCCTGGTTGG CGAACCGGGT CCGGCGGGTT CTAAAGGCGA AAGCGGTAAC 840 

AAAGGTGAGC CGGGTTCCGC GGGCCCGCAG GGTCCGCCGG GTCCGAGCGG CGAAGAAGGT 900 

AAACGTGGTC CGAACGGCGA GGCTGGTTCC GCAGGCCCTC CGGGTCCGCC GGGTCTGCGT 960 

GGCAGCCCGG GTAGCCGTGG CCTGCCGGGC GCGGACGGCC GTGCGGGCGT GATGGGTCCG 1020 

CCGGGTTCCC GTGGTGCCTC TGGTCCGGCT GGTGTCCGTG GTCCGAATGG CGACGCGGGC 1080 

25 CGTCCGGGTG AACCGGGCCT GATGGGTCCG CGTGGCCTGC CGGGTAGCCC GGGTAACATT 1140 

GGTCCGGCGG GTAAGGAGGG TCCGGTAGGT CTGCCGGGTA TTGATGGTCG TCCGGGTCCG 1200 



15 



20 



30 



35 



40 



45 



50 



55 



ATCGGCCCTG CGGGCGCTCG TGGCGAGCCG GGTAACATCG GTTTTCCGGG TCCGAAGGGT 1260 

CCGACGGGCG ACCCGGGCAA GAACGGTGAT AAAGGCCATG CAGGTCTGGC AGGTGCCCGT 1320 

GGTGCACCGG GTCCGGATGG TAACAATGGT GCGCAGGGTC CGCCGGGTCC GCAGGGCGTA 1380 

CAGGGTGGCA AAGGTGAACA GGGTCCGGCA GGCCCACCGG GCTTCCAGGG TCTGCCGGGT 1440 

CCGAGCGGCC CGGCTGGTGA AGTGGGCAAA CCGGGCGAAC GTGGCCTCCA TGGCGAGTTT 1500 

GGCCTGCCGG GTCCGGCCGG TCCGCGTGGT GAGCGCGGCC CTCCGGGCGA ATCCGGCGCG 1560 

GCAGGTCCGA CCGGCCCGAT TGGTTCCCGT GGTCCGAGCG GCCCACCGGG TCCGGACGGC 1620 

AACAAAGGCG AGCCGGGTGT TGTTGGTGCT GTTGGTACCG CCGGCCCGTC TGGTCCGAGC 1680 
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GGTCTGCCGG GCGAACGCGG TGCCGCTGGT ATTCCGGGCG GCAAAGGTGA AAAAGGTGAA 1740 

CCGGGTCTGC GCGGTGAGAT TGGCAACCCG GGCCGTGACG GTGCTCGCGG TGCACACGGC 1800 

GCGGTTGGCG CACCGGGTCC GGCAGGCGCG ACTGGTGATC GTGGCGAAGC TGGTGCAGCG 1860 

GGTCCGGCGG GTCCGGCCGG CCCTCGCGGT TCCCCGGGCG AACGCGGCGA AGTCGGCCCG 1920 

GCTGGCCCGA ATGGCTTTGC TGGCCCAGCG GGCGCTGCGG GCCAACCGGG TGCGAAAGGT 1980 

GAGCGCGGTG CCAAAGGCCC GAAAGGTGAA AATGGTGTAG TTGGTCCGAC GGGTCCGGTT 2040 

GGTGCGGCTG GTCCGGCTGG CCCGAATGGT CCGCCGGGTC CGGCAGGCAG CCGTGGCGAT 2100 

25 GGTGGCCCAC CGGGCATGAC CGGTTTCCCT GGCGCGGCCG GTCGCACCGG CCCGCCGGGT 2160 

CCGTCTGGCA TTTCTGGCCC ACCGGGTCCG CCGGGTCCGG CGGGCAAAGA AGGTCTGCGT 2220 

GGCCCACGCG GCGACCAGGG TCCGGTGGGC CGTACCGGCG AAGTCGGTGC TGTTGGCCCT 2280 

CCGGGCTTTG CGGGTGAGAA AGGTCCGAGC GGTGAAGCTG GCACCGCAGG CCCGCCGGGT 2340 

ACGCCGGGTC CGCAAGGTCT GCTGGGTGCT CCGGGTATCC TGGGCCTGCC GGGCTCCCGT 2400 

GGCGAACGCG GTCTGCCGGG CGTTGCAGGC GCTGTAGGCG AACCGGGTCC GCTGGGTATC 2460 

GCGGGTCCGC CGGGTGCGCG TGGTCCGCCG GGTGCCGTGG GCTCTCCGGG TGTTAACGGC 2520 

GCCCCTGGTG AAGCGGGCCG CGACGGCAAT CCGGGCAACG ATGGTCCGCC GGGTCGTGAT 2580 

GGTCAGCCGG GTCACAAAGG TGAGCGTGGC TACCCGGGTA ACATCGGTCC GGTTGGTGCG 2640 

GCCGGCGCTC CGGGTCCGCA CGGTCCGGTA GGCCCAGCCG GCAAACACGG TAACCGTGGT 2700 

55 



30 



35 



40 



45 



50 



127 



EP 0 992 586 A2 

GAAACGGGTC CGTCCGGTCC GGTAGGTCCG GCGGGTGCTG TTGGTCCACG CGGCCCGTCC 2760 

GGCCCGCAGG GTATTCGCGG TGACAAAGGC GAACCGGGCG AAAAAGGTCC GCGTGGTCTG 2820 

CCGGGCCTTA AGGGCCACAA CGGTCTGCAA GGTCTGCCGG GTATCGCGGG TCACCACGGT 2880 

GATCAGGGTG CTCCGGGTTC CGTTGGTCCG GCCGGTCCGC GTGGCCCGGC TGGTCCGTCT 2940 

,5 GGTCCGGCCG GTAAAGACGG CCGTACGGGC CACCCGGGTA CGGTGGGTCC GGCCGGCATT 3000 

CGCGGTCCGC AAGGTCACCA GGGTCCGGCG GGTCCGCCGG GTCCGCCGGG TCCGCCGGGT 3060 

CCGCCGGGTG TTAGCGGTGG CGGTTATGAT TTTGGTTATG ACGGTGATTT CTATCGTGCG 3120 
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(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1040 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLE COLE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Gin Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Met Gly Leu 
. 1 5 .10 15 
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Met Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin 
20 25 30 

Gly Phe Gin Gly Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly 
35 40 45 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Pro Gly Lys Ala Gly Glu 
50 55 60 

Asp Gly His Pro Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Val Val 
65 70 75 80 

Gly Pro Gin Gly Ala Arg Gly Phe Pro Gly Thr Pro Gly Leu Pro Gly 
85 90 95 

Phe Lys Gly lie Arg Gly His Asn Gly Leu Asp Gly Leu Lys Gly Gin 
100 105 110 

Pro Gly Ala Pro Gly Val Lys Gly Glu Pro Gly Ala Pro Gly Glu Asn 
115 120 125 

Gly Thr Pro Gly Gin Thr Gly Ala Arg Gly Leu Pro Gly Glu Arg Gly 
130 135 140 



■*o Arg Val Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly Ser Asp Gly Ser 

145 150 155 160 
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Val Gly Pro Val Gly Pro Ala Gly Pro He Gly Ser Ala Gly Pro Pro 

165 170 175 

Gly Phe Pro Gly Ala Pro Gly Pro Lys Gly Glu lie Gly Ala Val Gly 
180 185 190 
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Asn Ala Gly Pro Ala Gly Pro Ala Gly Pro Arg Gly Glu Val Gly Leu 
195 200 205 

Pro Gly Leu Ser Gly Pro Val Gly Pro Pro Gly Asn Pro Gly Ala Asn 
210 215 220 

Gly Leu Thr Gly Ala Lys Gly Ala Ala Gly Leu Pro Gly Val Ala Gly 
225 230 235 240 

Ala Pro Gly Leu Pro Gly Pro Arg Gly lie Pro Gly Pro Val Gly Ala 

245 250 255 

Ala Gly Ala Thr Gly Ala Arg Gly Leu Val Gly Glu Pro Gly Pro Ala 
260 265 270 

Gly Ser Lys Gly Glu Ser Gly Asn Lys Gly Glu Pro Gly Ser Ala Gly 
275 280 285 

Pro Gin Gly Pro Pro Gly Pro Ser Gly Glu Glu Gly Lys Arg Gly Pro 
290 295 300 

Asn Gly Glu Ala Gly Ser Ala Gly Pro Pro Gly Pro Pro Gly Leu Arg 
305 310 315 320 

Gly Ser Pro Gly Ser Arg Gly Leu Pro Gly Ala Asp Gly Arg Ala Gly 
325 330 335 

Val Met Gly Pro Pro Gly Ser Arg Gly Ala Ser Gly Pro Ala Gly Val 
340 345 350 

Arg Gly Pro Asn Gly Asp Ala Gly Arg Pro Gly Glu Pro Gly Leu Met 
355 360 365 
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Gly Pro Arg Gly Leu 
370 

Lys Glu Gly Pro Val 
385 

lie Gly Pro Ala Gly 
405 

Gly Pro Lys Gly Pro 
420 

His Ala Gly Leu Ala 

435. 

Asn Gly Ala Gin Gly 
450 

Gly Glu Gin Gly Pro 
465 

Pro Ser Gly Pro Ala 
485 

His Gly Glu Phe Gly 
500 

Gly Pro Pro Gly Glu 
515 

Ser Arg Gly Pro Ser 
530 



Pro Gly Ser Pro Gly Asn 
375 

Gly Leu Pro Gly lie Asp 
390 395 

Ala Arg Gly Glu Pro Gly 
410 

Thr Gly Asp Pro Gly Lys 
425 

Gly Ala Arg Gly Ala Pro 
440 

Pro Pro Gly Pro Gin Gly 
455 

Ala Gly Pro Pro Gly Phe 
470 475 

Gly Glu Val Gly Lys Pro 
490 

Leu Pro Gly Pro Ala Gly 
505 

Ser Gly Ala Ala Gly Pro 
520 

Gly Pro Pro Gly Pro Asp 
535 



lie Gly Pro Ala Gly 
380 

Gly Arg Pro Gly Pro 
400 

Asn lie Gly Phe Pro 
415 

Asn Gly Asp Lys Gly 
430 

Gly Pro Asp Gly Asn 
445 

Val Gin Gly Gly Lys 
460 

Gin Gly Leu Pro Gly 
480 

Gly Glu Arg Gly Leu 
495 

Pro Arg Gly Glu Arg 
510 

Thr Gly Pro He Gly 
525 

Gly Asn Lys Gly Glu 
540 
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Pro Gly Val Val Gly Ala Val Gly Thr Ala Gly Pro Ser Gly Pro Ser 
545 550 555 560 

Gly Leu Pro Gly Glu Arg Gly Ala Ala Gly lie Pro Gly Gly Lys Gly 
565 570 575 

Glu Lys Gly Glu Pro Gly Leu Arg Gly Glu lie Gly Asn Pro Gly Arg 
580 585 590 

Asp Gly Ala Arg Gly Ala His Gly Ala Val Gly Ala Pro Gly Pro Ala 
595 600 605 



Gly Ala Thr Gly Asp Arg Gly Glu Ala Gly Ala Ala Gly Pro Ala Gly 
610 615 620 

Pro Ala Gly Pro Arg Gly Ser Pro Gly Glu Arg Gly Glu Val Gly Pro 
625 630 635 640 

Ala Gly Pro Asn Gly Phe Ala Gly Pro Ala Gly Ala Ala Gly Gin Pro 
645 650 655 

Gly Ala Lys Gly Glu Arg Gly Ala Lys Gly Pro Lys Gly Glu Asn Gly 
660 665 670 

Val Val Gly Pro Thr Gly Pro Val Gly Ala Ala Gly Pro Ala Gly Pro 
675 680 685 

Asn Gly Pro Pro Gly Pro Ala Gly Ser Arg Gly Asp Gly Gly Pro Pro 
690 695 700 

Gly Met Thr Gly Phe Pro Gly Ala Ala Gly Arg Thr Gly Pro Pro Gly 
705 710 715 720 
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Pro Ser Gly lie Ser Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Lys 
725 730 735 

Glu Gly Leu Arg Gly Pro Arg Gly Asp Gin Gly Pro Val Gly Arg Thr 
740 745 750 

Gly Glu Val Gly Ala Val Gly Pro Pro Gly Phe Ala Gly Glu Lys Gly 
755 760 7S5 

Pro Ser Gly Glu Ala Gly Thr Ala Gly Pro Pro Gly Thr Pro Gly Pro 
770 775 780 

Gin Gly Leu Leu Gly Ala Pro Gly lie Leu Gly Leu Pro Gly Ser Arg 
785 790 795 800 

Gly Glu Arg Gly Leu Pro Gly Val Ala Gly Ala Val Gly Glu Pro Gly 
805 810 815 

Pro Leu Gly He Ala Gly Pro Pro Gly Ala Arg Gly Pro Pro Gly Ala 
820 825 830 

Val Gly Ser Pro Gly Val Asn Gly Ala Pro Gly Glu Ala Gly Arg Asp 
835 840 845 

Gly Asn Pro Gly Asn Asp Gly Pro Pro Gly Arg Asp Gly Gin Pro Gly 
850 855 860 

His Lys Gly Glu Arg Gly Tyr Pro Gly Asn He Gly Pro Val Gly Ala 
865 870 875 880 

Ala Gly Ala Pro Gly Pro His Gly Pro Val Gly Pro Ala Gly Lys His 
885 890 895 
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Gly Asn Arg Gly Glu Thr Gly Pro Ser Gly Pro Val Gly Pro Ala Gly 
900 905 910 

Ala Val Gly Pro Arg Gly Pro Ser Gly Pro Gin Gly lie Arg Gly Asp 
915 920 925 

Lys Gly Glu Pro Gly Glu Lys Gly Pro Arg Gly Leu Pro Gly Leu Lys 
930 935 940 

Gly His Asn Gly Leu Gin Gly Leu Pro Gly lie Ala Gly His His Gly 
945 950 955 960 

Asp Gin Gly Ala Pro Gly Ser Val Gly Pro Ala Gly Pro Arg Gly Pro 
965 970 975 

Ala Gly Pro Ser Gly Pro Ala Gly Lys Asp Gly Arg Thr Gly His Pro 
980 985 990 

Gly Thr Val Gly Pro Ala Gly lie Arg Gly Pro Gin Gly His Gin Gly 
995 1000 1005 

Pro Ala Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Val 
1010 1015 1020 



40 Ser Gly Gly Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe Tyr Arg Ala 

1025 1030 1035 1040 
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(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 76 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY : linear 
(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GGAATTCATG CAGTATGATG GCAAAGGCGT CGGCCTCGGC CCGGGCCCAA TGGGCCTCAT 
GGGCCCGCGC GGCCCA 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 79 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
CCGGGCGCGC CGGGTGGCCC ACGTCGACCG CGGGGTCCGG GCGTTCCAAA GGTCCCGGGA 
CGGCCAATTA TTCGAACCC 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 82 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GGAATTCGCC GGTGAGCCGG GTGAACCGGG CCAAACGGGT CCGGCAGGTC CACGTGGTCC 
AGCGGGCCCG CCTGGCAAGG CG 
(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 84 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

CCGGGCGGAC CGTTCCGCCC ACTTCTACCG GTGGGACCGT TTGGCCCGGC GGGCCACTCG 

CACCGCATCA CATTATTCGA ACCC 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

CAGTATGATG GCAAAGGCGT CGGCCTCGGC CCGGGCCCAA TGGGCCTCAT GGGCCCGCGC 60 

GGCCCACCGG GTGCAGCTGG CGCCCCAGGC CCGCAAGGTT TCCAGGGCCC TGCCGGTGAG 120 

CCGGGTGAAC CGGGCCAAAC GGGTCCGGCA GGTGCACGTG GTCCAGCGGG CCCGCCTGGC 180 

25 AAGGCGGGTG AAGATGGCCA CCCTGGCAAA CCGGGCCGCC CGGGTGAGCG TGGCGTAGTG 240 



(2) INFORMATION FOR SEQ ID NO:38: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 amino acids 
35 (B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION : SEQ ID N0:38: 

Gin Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Met Gly Leu 
1 5 10 15 
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Met Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin 

5 

20 25 30 

Gly Phe Gin Gly Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly 
10 35 40 45 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Pro Gly Lys Ala Gly Glu 
15 50 55 60 

Asp Gly His Pro Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Val Val 
65 70 75 80 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 276 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

35 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

40 

ATGGGGCTCG CTGGCCCACC GGGCGAACCG GGTCCGCCAG GCCCGAAAGG TCCGCGTGGC 60 

45 

GATAGCGGGC TCGCTGGCCC ACCGGGCGAA CCGGGTCCGC CAGGCCCGAA AGGTCCGCGT 120 
GGCGATAGCG GGCTCGCTGG CCCACCGGGC GAACCGGGTC CGCCAGGCCC GAAAGGTCCG 180 

50 

CGTGGCGATA GCGGGCTCGC TGGCCCACCG GGCGAACCGG GTCCGCCAGG CCCGAAAGGT . 240 
55 CCGCGTGGCG ATAGCGGGCT CCCGGGCGAT TCCTAA 276 
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(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Met Gly Leu Ala Gly Pro Pro Gly Glu Pro Gly Pro Pro Gly Pro Lys 
1 5 10 15 

Gly Pro Arg Gly Asp Ser Gly Leu Ala Gly Pro Pro Gly Glu Pro Gly 
20 25 30 

Pro Pro Gly Pro Lys Gly Pro Arg Gly Asp Ser Gly Leu Ala Gly Pro 
35 40 45 

Pro Gly Glu Pro Gly Pro Pro Gly Pro Lys Gly Pro Arg Gly Asp Ser 
50 55 60 

Gly Leu Ala Gly Pro Pro Gly Glu Pro Gly Pro Pro Gly Pro Lys Gly 
65 70 75 80 

Pro Arg Gly Asp Ser Gly Leu Pro Gly Asp Ser 
85 90 
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(2) INFORMATION FOR SEQ ID NO: 41: 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

10 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

Gly Pro Pro Gly Leu Ala Gly Pro Pro Gly Glu ser Gly 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

40 

(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

45 

(B) LOCATION: 2.. 3 

(D) OTHER INFORMATION: /products "4-hydroxyproline" 
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(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 8. .9 

(D) OTHER INFORMATION: /products -Xaa = 4-hydroxyproline" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 



Gly Xaa Xaa Gly Leu Ala Gly Xaa Xaa Gly Glu Ser Gly 
'5 i 5 10 

(2) INFORMATION FOR SEQ ID NO: 43: 



<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 660 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

ATGGGCCCGC CGGGTCTGGC GGGCCCTCCG GGTGAAAGCG GTCGTGAAGG CGCGCCGGGT 60 

GCCGAAGGCA GCCCAGGCCG CGACGGTAGC CCGGGGGCCA AAGGGGATCG TGGTGAAACC 120 

GGCCCGGCGG GCCCCCCGGG TGCACCGGGC GCGCCGGGTG CCCCAGGCCC GGTGGGCCCG 180 

GCGGGCAAAA GCGGTGATCG TGGTGAGACC GGTCCGGCGG GCCCGGCCGG TCCGGTGGGC 240 

CCAGCGGGCG CCCGTGGCCC GGCCGGTCCG CAGGGCCCGC GGGGTGACAA AGGTGAAACG 300 

GGCGAACAGG GCGACCGTGG CATTAAAGGC CACCGTGGCT TCAGCGGCCT GCAGGGTCCA 360 
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CCGGGCCCGC CGGGCAGTCC GGGTGAACAG GGTCCGTCCG GAGCCAGCGG GCCGGCGGGC 420 

CCACGCGGTC CGCCGGGCAG CGCGGGCGCG CCGGGCAA^G ACGGTCTGAA CGGTCTGCCG 480 

GGCCCGATCG GCCCGCCGGG CCCACGCGGC CGCACCGGTG ATGCGGGTCC GGTGGGTCCC 540 

CCGGGCCCGC CGGGCCCGCC AGGCCCGCCG GGACCGCCGA GCGCGGGTTT CGACTTCAGC 600 

TTCCTGCCGC AGCCGCCGCA GGAGAAAGCG CACGACGGCG GTCGCTACTA CCGTGCGTAA 660 



20 (2) INFORMATION FOR SEQ ID NO: 44: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 219 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 



Met Gly Pro Pro Gly Leu Ala Gly Pro Pro Gly Glu Ser Gly Arg Glu 
40 1 5 10 15 

Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly Arg Asp Gly Ser Pro Gly 
45 20 25 30 

Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro Pro Gly Ala 
35 40 45 

50 

Pro Gly Ala Pro Gly Ala Pro Gly Pro Val Gly Pro Ala Gly Lys Ser 
50 55 60 

55 
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Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro Ala Gly Pro Val Gly 
65 70 75 80 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Gin Gly Pro Arg Gly Asp 
85 90 95 

Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg Gly lie Lys Gly His Arg 
100 105 110 

Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly Pro Pro Gly Ser Pro Gly 
115 120 125 

Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro Ala Gly Pro Arg Gly Pro 
130 135 140 

Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp Gly Leu Asn Gly Leu Pro 
145 150 155 160 

Gly Pro lie Gly Pro Pro Gly Pro Arg Gly Arg Thr Gly Asp Ala Gly 
165 170 175 

Pro Val Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro 
180 185 190 

Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu Pro Gin Pro Pro Gin Glu 
195 200 205 

Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg Ala 
210 215 



50 (2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 627 base pairs 

55 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

ATGGGCTCTC CGGGTGTTAA CGGCGCCCCT GGTGAAGCGG GCCGCGACGG CAATCCGGGC 60 

AACGATGGTC CGCCGGGTCG TGATGGTCAG CCGGGTCACA AAGGTGAGCG TGGCTACCCG 120 

GGTAACATCG GTCCGGTTGG TGCGGCCGGC GCTCCGGGTC CGCACGGTCC GGTAGGCCCA 180 

GCCGGCAAAC ACGGTAACCG TGGTGAAACG GGTCCGTCCG GTCCGGTAGG TCCGGCGGGT 240 

GCTGTTGGTC CACGCGGCCC GTCCGGCCCG CAGGGTATTC GCGGTGACAA AGGCGAACCG 300 

GGCGAAAAAG GTCCGCGTGG TCTGCCGGGC CTTAAGGGCC ACAACGGTCT GCAAGGTCTG 360 

CCGGGTATCG CGGGTCACCA CGGTGATCAG GGTGCTCCGG GTTCCGTTGG TCCGGCCGGT 420 

CCGCGTGGCC CGGCTGGTCC GTCTGGTCCG GCCGGTAAAG ACGGCCGTAC GGGCCACCCG 4B0 

GGTACGGTGG GTCCGGCCGG CATTCGCGGT CCGCAAGGTC ACCAGGGTCC GGCGGGTCCG 540 

CCGGGTCCGC CGGGTCCGCC GGGTCCGCCG GGTGTTAGCG GTGGCGGTTA TGATTTTGGT 600 

TATGACGGTG ATTTCTATCG TGCGTAA 627 
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(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 219 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Met Gly Pro Pro Gly Leu Ala Gly Pro Pro Gly Glu Ser Gly Arg Glu 
1 S 10 15 

Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly Arg Asp Gly Ser Pro Gly 
20 25 30 

Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro Pro Gly Ala 
35 40 45 

Pro Gly Ala Pro Gly Ala Pro Gly Pro Val Gly Pro Ala Gly Lys Ser 
50 55 60 

Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro Ala Gly Pro Val Gly 
65 70 75 80 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Gin Gly Pro Arg Gly Asp 
85 90 95 



so Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg Gly lie Lys Gly His Arg 

100 105 110 
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Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly Pro Pro Gly Ser Pro Gly 
5 115 120 125 

Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro Ala Gly Pro Arg Gly Pro 
»o 130 135 140 

Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp Gly Leu Asn Gly Leu Pro 
145 150 155 160 
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Gly Pro He Gly Pro Pro Gly Pro Arg Gly Arg Thr Gly Asp Ala Gly 
165 170 175 

Pro Val Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro 
180 185 .190 

Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu Pro Gin Pro Pro Gin Glu 
195 200 205 

Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg Ala 
210 215 

(2) INFORMATION FOR SEQ ID NO:47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 95 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
GGAATTCTCC CATGGGCCCG CCGGGTCTGG CGGGCCCTCC GGGTGAAAGC GGTCGTGAAG 60 
GCGCGCCGGG TGCCGAAGGC AGCCCAGGCC GCGAC 95 
(2) INFORMATION FOR SEQ ID NO: 48: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 97 base pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 
CTTCCGTCGG GTCCGGCGCT GCCATCGGGC CCCCGGTTTC CCCTAGCACC ACTTTGGCCG 60 
GGCCGCCCGG GGGGCCCACG TGGCATTATT CGAACCC 97 



(2) INFORMATION FOR SEQ ID NO:49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
45 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

50 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
GGAATTCGGT GCACCGGGCG CGCCGGGTGC CCCAGGCCCG GTGGGCCCGG CGGGCAAAAG 60 
CGGTGATCGT GGCGAGACCG GTCCGGCGGG C 91 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
CTCTGGCCAG GCCGCCCGGG CCGGCCAGGC CACCCGGGTC GCCCGCGGGC ACCGGGCCGG 60 
CCAGGCGTCC CGGGCGCCAT TATTCGAACC C 91 



Claims 

1. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof capable of providing a self ag- 
gregate in a cell which does not ordinarily hydroxylate proline comprising 



providing a nucleic acid sequence encoding the EMP or fragment thereof which has been optimized for ex- 
pression in the cell by substitution of codons preferred by the cell for naturally occurring codons not preferred 
by the cell; 

incorporating the nucleic acid sequence into the cell; 
50 providing hypertonic growth media containing at least one amino acid selected from the group consisting of 

frans-4-hydroxyproline and 3-hydroxyproline; and 

contacting the cell with the growth media wherein the at least one amino acid is assimilated into the cell and 
incorporated into the EMP or fragment thereof. 

55 2. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1 wherein the 
EMP is selected from the group consisting of human collagen, fibrinogen, fibronectin and collagen-like peptide. 

3. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1 or2, wherein 
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the cell is a prokaryote. 

4. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 3, wherein 
the prokaryote is E. coli. 

5 

5. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 2-4, 
wherein.the human collagen is Type I (ot1 ). 

6. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 5, wherein 
10 the nucleic acid encoding human collagen Type I (a1) includes the sequence shown in SEQ.ID.NO.19. 

7. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claim 2 to 4, 
wherein the human collagen is Type I (a2). 

*5 8. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 7, wherein 
the nucleic acid encoding human collagen Type I (ot2)= includes the sequence shown in SEQ.ID.NO.31. 

9. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 1 to 
8, wherein the nucleic acid encoding the EMP includes the sequence shown in SEQ.ID.NO. 43. 

20 

10. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 1 to 
8, wherein the nucleic acid encoding the EMP includes the sequence shown in SEQ.ID.NO. 46. 

11. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 1 to 
25 10, wherein the nucleic acid sequence includes nucleic acid encoding a physiologically active peptide. 

12. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 11, wherein 
the physiologically active peptide is selected from the group consisting of bone morphogenic protein, transforming 
growth factor-p" and decorin. 

30 

13. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 1 to 
4, wherein the EMP or fragment thereof is a collagen-like peptide. 

14. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1 3, wherein 
35 the EMP or fragment thereof includes the amino acid sequence depicted in SEQ.ID.NO. 4. 

1 5. A method, of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to daim 1 3, wherein 
the EMP includes the amino acid sequence depicted in SEQ.ID.NO.40. 

40 16. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1, wherein 
the EMP includes the amino acid sequence depicted in SEQ.ID.NO. 44. 

17. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1 , wherein 
the EMP is a collagen fragment including the amino acid sequence depicted in SEQ.ID.NO. 26. 

45 

18. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1, wherein 
the EMP is a collagen fragment including the amino acid sequence depicted in SEQ.ID.NO. 46. 

19. Nucleic acid encoding a chimeric protein comprising a domain from a physiologically active peptide and a domain 
50 from an Extracellular Matrix Protein (EMP) which is capable of providing a self-aggregate. 

20. Nucleic acid encoding a chimeric protein according to claim 19, wherein said EMP is selected from the group 
consisting of human collagen, fibrinogen, fibronectin and collagen-like peptide. 

55 21. Nucleic acid encoding a chimeric protein according to claim 19 or.20 wherein said domain from a physiologically . 
• active peptide is selected from the group consisting of bone morphogenic protein, transforming growth factor - p 
and decorin. 
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22. Nucleic acid encoding a chimeric protein according to any of claims 19 - 21 . wherein said chimeric protein includes 
the sequence shown in SEQ.ID.NO.6. 

23. Nucleic acid encoding a chimeric protein according to any of claims 19-21, wherein said chimeric protein includes 
5 the sequence shown in SEQ.ID.NO.8. 

24. Nucleic acid encoding a chimeric protein according to any of claims 19-21, wherein said chimeric protein includes 
the sequence shown in SEQ.ID.NO.11. 

10 25. Nucleic acid encoding a chimeric protein according to any of claims 19 - 21 , wherein said chimeric protein includes 
the sequence shown in SEQ.ID.NO.10. 

26. A cloning vector comprising nucleic acid according to any of claims 19-21. 

15 27. A cloning vector according to claim 26 wherein said cloning vector is selected from the group consisting of plasmid, 
phage, cosmid and artificial chromosome. 

28. A cell transformed by a vector according to claim 26 or 27. 

20 29. A chimeric protein comprising a domain from a physiologically active peptide and a domain from an Extracellular. 
Matrix Protein (EMP) which is capable of providing a self-aggregate. 

30. A chimeric protein according to claim 29 wherein said EMP is selected from the group consisting of human collagen, 
fibrinogen, fibronectin and collagen-like peptide. 

25 

31 . A chimeric protein according to claim 29 or 30 wherein said domain from a physiologically active peptide is selected 
from the group consisting of bone morphogenic protein, transforming growth factor - p and decorin. 

32. A chimeric protein according to any of claims 29 - 31 , wherein said chimeric protein includes the sequence shown 
30 in SEQ.ID.NO.6. 

33. - A chimeric protein according to any of claims 29 - 31 , wherein said chimeric protein includes the sequence shown 

in SEQ.ID.NO.8. 

35 34. A chimeric protein according to any of claims 29 - 31 , wherein said chimeric protein includes the sequence shown 
in SEQ.ID.NO.10. 

35. A chimeric protein according to any of claims 29 - 31 , wherein said chimeric protein includes the sequence shown 
inSEQ.ID.NO.11. 

40 

36. Human collagen or fragment thereof produced by a prokaryotic cell, the human collagen or fragment thereof being 
capable of providing a self-aggregate. 

37. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 36 wherein the human 
45 collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.1 9. 

38. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 36 wherein the human 
collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.39. 

50 39. Human collagen or fragment thereof produced by a prokaryotic cell according to daim 36 wherein the human 
collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.43. 

40. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 36 wherein the human 
collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.45. 

55 

41. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 36 wherein the collagen or 
fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.31. 
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42. Nucleic acid comprising the sequence shown in SEQ.ID.NO. 19. 

43. Nucleic acid comprising the sequence shown in SEQ.ID.NO. 31. 
5 44. Nucleic acid comprising the sequence shown in SEQ.ID.NO. 43. 

45. Nucleic acid comprising the sequence shown in SEQ.ID.NO. 45. 

46. Nucleic acid encoding a human Extracellular Matrix Protein (EMP) or fragment thereof wherein the codon usage 
10 in the nucleic acid sequence reflects preferred codon usage in a prokaryotic cell. 

47. Nucleic acid according to daim 46 wherein the prokaryotic cell is E. coli. 

48. Nucleic acid according to claim 43 wherein the EMP is selected from the group consisting of collagen, fibrinogen, 
15 fibronectin and collagen-like peptide. 
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FIG. I 
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CRGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC 
TGGCCCCATG GGTCCCTCTG GTOCTCGTGG TCTOCCTGGC CCCCCTGGTG 
CACCTGGTCC CCAAGGCTTC CAAGGTCCCC CTGGTGAGCC TGGCGRGCCT 
GGAGCTTCAG GTCCCATGGG TCOCCGAGGT CCCOCAGGTC CCCCTGGAAA 
GAATGGAGAT GATGGGGAAG CTGGAAAfiCC TGGTCGTCCT GGTGAGCGTG 
GGCCTCCTGG GCCTCAGGGT GCTGGAGGAT TGCCCGGAAC AGCTGGCeTC 
CCTGGAATGA AGGGACACAG AGGTTTCAGT GGTTTGGATG GTGCCAAGGG 
AGATGCTGGT CCTGCTGGTC CTAAGGGTGA GCCTGGCAGC CCTGGTGAAA 
ATGGAGCTCC TGGTCAGATG GGCCCCCGTG GCCTGCCTGG TGAGAGAGGT 
CGCCCTGGAG CCCCTGGCCC TGCTGGTGCT CGTGGAAATG ATGGTGCTAC 
TGGTQCTGCC GGGCCCCCTG GTCCCACCGG CCCCGCTGGT CCTCCTGGCT 
TCCCTGGTGC TGTTGGTGCT AAGGGTGAAG CTGGTCCCCA AGGGGCCCGA 
GGCTCTGAAG GTCCCCAGGG TGTGOGTGGT GAQCCTGGCC CCCCTGGCCC 
TGCTGGTGCT GCTGGCCCTG CTGGAAACCC TGGTGCTGAT GGACAGCCTG 
GTGCTAAAGG TGCCAAXGGT GCTCCTGGTA TTGCTGGTGC TCCTGGCTTC 
CCTGGTGCCC GAGGCCCCTC TGGRCCCCAG GGOCOCGGCG GCCCTCCTGG 
TCCCAAGGGT AACAGCGGTG AACCTGGTGC TCCTGGCAGC AAAGGAGACA 
CTGGTGCTAA GGGAGAGCCT GGCCCTGTTG GTGTTCAAGG ACCCCCTGGC 
CCTGCTGGAG AGGAAGGAAA GCG5GGAGCT CGAGGTGAAC CCGGACCCAC 
TGGCCTGCCC GGACCCCCTG GCGSGCGTGG TGGACCTGGT AGCCGTGGTT 
TCCCTGGCGC AGATGGTGTT GCTGGTCCCA AGGGTCCCGC TGGTGA&CGT 
GGTTCTCCTG GCCCCGCTGG CCCCAA\GGA TCTCCTGGTG AAGCTGGTCG 
TCCCGGTGAA GCTGGTCTGC CTGGTGGCAA GGGTCIGACT GGAAGCCCTG 
GCAGCCCTGG TCCTGATGGC AAAACTGQCC CCCCTGGTCC CGCCGGTCAA 
GATGGTCGCC CCGGACCCCC AGGCCCACCT GGTGCCCGTG GTCAGGCTGG 
TGTGATGGGA TTCCCTGGAC CTAAAGGTGC TGCTGGAGAG CCCGGCAAGG 
CTGGAGAGCG AGGTGTTCCC GGACCCCCTG GCGCTGTCGG TCCTGCTGGC 
AAAGATGGAG AGGCTGGAGC TCAGGGAGCC CCTGGCCCTG CTGGTCCCGC 
TGGCGAGAGA GGTGAACAAG GCCCTGCTGG CTCCCCCGGA TTCCAGGGTC 
TCCCTGGTCC TGCTGGTCCT GCAGGTGAAG CAGGCAA?JCC TGGTGAACAG 
GGTGTTCCTG GAGACCTTGG CGGCCCTGGC CCCTCTGGAG OAGAGGCGA 
GajSAGGTTTC CCTGGCGAGC GTGGTGIGCA AGGTCCCCCT GGTCCTGCTG 
GACCCCGAGG GGCCAACGGT GCTCCCGGCA ACGATGGTGC TAAGGGTGAT 
GCTGGTGCCC CTGGAGCTCC CGGTAGCCAG GGCGCCCCTG GCCTTCAGGG 
AATGCCTGGT GAACGTGGTG CAGCTGGTCT TCCAGGGCCT AAGGGTGACA. 
GAGGTGATGC TGGTCCCAAA GGTGCTGATG GCTCTCCTGG CAAAGATGGC 
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CTXEGTGGTC TGACCGGCCC CA1TGGTCCT CCTGGCCCTG CTGGTGCCCC. 
TGGTGACAAG GGTGAAAGTG GTCCCAGCGG CCCTGCTGGT CCCACTGGAG 
CTCGTGGTGC CCCCGGAGftC CGTGGTGAGC CTGGTCCCCC CGGCOCTGCT 
GGCTTTGCIG GCCCCCCTGG TGCTGACGGC CAftGCTGGTG CTMAGGCGA. 
ACCTGGTGAT GCTGGTGCCA AAGGCGATQC TGGTCCCCCT GGGCCTGCCG 
GACCCGCTGG AOCCCCTGGC CCCATTGGTA ATGTTGGTGC TCCTGGAGCC 
AAAGGTGCTC GCGGCAGOGC TGGTCCCCCT GGTGCTACTG GTTTCCCTGG 
TGCTGCTGGC CGAGTCGGTC CTCCTGGCCC CTCTQGAAAT GCTGGACCCC 
CTGGCCCTCC TGGTCCTGCT GGCAAAGftAG GCGGCMAGG TCCCCGTGGT 
GAGACTGGCC CTQCTGGACG TCCTGGTGAA. GTTGGTCCCC CTGGTCCCCC 
TCGCCCTGCT GGCGAGAAAG GATCCCCTGG TGCTGATGGT CCTGCTGGTG 
CTCCTGGTAC TCCCGGGCCT CAAGGTATTG CTX^30£-CG TGGTGTGGTC 
GGCCTGCCTG GTCAGAGAGG AGRGftGAGGC TTCCCTGGTC TTCCTGGCCC 
CTCTGGTGAA CCTGGCARAC AAGGTCCCTC TGGftGCAAGT GGTGAACGTG 
GICCCCCCGG TCCCATGGGC CCCOCTGGA.T TGGCTGGSCC CCCTGSTGAA 
TCTGG?JCGTG AG3GGGCTCC TGCTQCCGftA GGTTCCCCTG GftCGAGSCQG 
TTCTCCTGGC GCCAAGGGTG ACCGTGGTGA. GftCCGGCCCC CCIGGACCCC 
CTC-GTtiCTCC TGGTGCTCCT GGTGCCCCTG GCCOCGTTGG CCCTGCTGGC 
AAGAGTGGTG ATCGTGGTGA. GACIGGTCCT GCTGGTCCCG (XGGTCCCG? 
CGGCCCCGCT C^CCCCCGTG GCOCCGCCGG ACCCCA. a JGGC CCCCGTC-GTC 
AC a AGGGTGA. GACAGQCGSA CAGGGCGACA GAGGCATAAA GGGTCACCGT 
OT-CTTCTCTG GCCTCCAGGG TCCCCCTC-GC (XTCCTC-GCT CTCCTGGTGA 
AOAGCTOCC TCTGGAGCCT CTQGTCCTGC TQSTCCCCGA GGTCCCCCTG 
GCTCTCCTGG TGCTGCTGGC AAAGATGGAC TCAACGGTCT CCCTGGCCCC 
ATTGGGCCCC CTGCTCCTCp CGGTCGOjCT GGTCATGCTG GTCCTGTTGG 
TCCCCCCGGC CCTCCTGGAC CTCCTGSTCC CCCTGGTCCT CCCAGCGCTG 
GTTTCGACTT CAGCTTCCTC CCCCAGCCAC CTCAAGAGAA. GGCTCACGAT 
GGTGGCCGCT ACEACCGGGC't-3 ' 



FIG. 3B 



156 



EP 0 992 586 A2 




FIG. 4 
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CAGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC 
TGGCCCCATG GGTCCCTCTG GTOCTCGTGG TCTOCCTGGC GCCCCTGGTG 
CAOCTGGTCC CCAAGGCTTC CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT 
GGAGCTTCAG GTCCCATGGG TCCCCGAGGT CCCDCAGGTC CCCCTGGAM. 
GAATGGAGAT GATGGGGAAG CTGGAMACC TGGTCCTGCT-3 ' 
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GGA TCC ATG QSQ CXC GCT GGC CCA- CCG GGC GAA CC G GOT 
rrr, rfA GG<" cm AAA GOT CCG CGT GGC GAT AGC GGG CTC 
CCG GGC GAT TCC TAA TGG ATC C 



FIG. 7 
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Gly-Leu-'Ala-Gly-Pro-Pro-Gly-Glu-Pro-Gly-Pro-Pro- 
Gly-Pro-Lys-Gly-Pro-Arg-Gly-Asp-Ser 



FIG. 8 
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CAGCGGGCCA 
GGACTTCAGC 
ACCAGGCCTT 
CTCAACTCAA 
TTCCAGTATC 
CCATGCTGTA 
GAGATGGTAG 



GGAAGAAGAA 
GATGTGGGCT 
CTACTGCCAT 
CCAACCATGC 
CCCAAAGCCT 
CCTGGATGAG 
TAGAGGGATG 



TAAGAACTGC 
GGAATGACTG 
GGGGACTGCC 
CATTGTGCAG 
GTTGTGTGCC 
TATGATAAGG 
TGGGTGCCGC 



CGGCGCCACT 
GATTGTGGCC 
CCTTTCCACT 
ACCCTGGTCA 
CACTGAACTG 
TGGTACTGAA 
-3' 



CGCTC^TATGT 
CCACCAGGCT 
GGCTGACCAC 
ATTCTGTCAA 
AGTGCCATCT 
AAATTATCAG 
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10 20 30 40 . 50 60 

QmSYGYDEXS tggisvpgpm gpsgprglpg ppgapgpqgf qcppcepgep cascpmgprg 

70 80 90 100 110 120 

PPGPPGXNGD DGEAGXPGRP GERGPPGPQG ARGLPGTAGt* PGKKGHRGFS GLDGAXGDAG 

130 140 150 160 170 180 

PAGPXGEPGS PCENGAPGQM GPKGLPCERC RPGAPGPAGA RGNDGATGAA G PPG PTC PAG 

190 200 210 220 230 240 

PPCFPGAVGA XGEAGPQGPR. CSECPQCVRG EPGPPGPAGA AGPAGNPGAD GQPGAXGANG 

250 260 270 280 290 300 

APGIAGAPGF PCARGPSGPQ GPGGPPCPXG NSGEPGAPGS KG DTGAKGEP CPVGVQGPPG 

310 320 330 340 350 360 

PACEEGXRGA RGEPGPTGLP GPPGERGGPG SRGFPGADGV AG PXG PAGER GSPGPAGPXG 

370 380 390 400 410 420 

SPCEAGRPGE ACLPGAXGL.T GSPGSPGPDG KTGPPGPAGQ DGRPGPPGPP GARGQAGVMG 

430 440 450 460 470 480 

FPGPXGAAGS PGKAGERGVP GPPGAVGPAG XDGEAGAQGP PGPACPAGER GEQGPAGSPG 

490 500 510 520 530 540 

FQGLPGPAG? PGEAGXPGEQ GVPGDLGAPG PSGARGERGF PGERGVQGPP GPAGPRGANG 

550 560 570. 580 590 600 

APGNDGAXGD AGAPGAPGSQ GAPCLQCMPG ERGAAGLPGP XGDRGEAGPX GADGSFGKDG 

610 620 630 640 650 660 

VRGLTGPIGP PGPAGAPGDX GESGPSGPAG PTGARGAPGD RGEPGPPGPA GFAGPPGAEG 

670 680 690 700 710 720 

QPCAXGEPGD AGAKGDAGPP GPAGPAGPFG PIGNVGAPGA XGARGSAGPP GATGFPGAAG 

730 740 750 760 770 780 

RVGPPGPSGN AGPPGPPGPA GKEGGKGPRG ETGPAGRPGE VG PPG PPG PA GEXGSPGAEC 

790 800 810 820 830 840 • 

PAGAPGTPGP QGIAGQRGW GLPGQRGERG FPGLPGPSGS PGKQGPSGAS CERGPPGPHG 

850 860 870 880 890 900 

PPGLAGPPG3 SGREGAPAAE GSPGREGSPG AXGDRGETGP AGPPGAXGAX GAPGPVGPAG 

910 920 93 0 94 0 950 9 60 

XSGDSGETGP AGPAGPVGPA GARGPACPQG PRGDKGETGE QGDRGIKGHR GFSGLQGPPG 

970 .980 990 1000 1010 1020 

PPGSPGEQGP SGASGPAGPR GPPGSACAPG XDGLNGLPG? IGPPGPPGRT GDAGPVGPPG 

1030 1040 1050 1060 1070 1080 

PPGPPGPPGP PSACFDFSrL PQPPQEXAHD CGRYYRARSQ RARXXNXNCR RHSLYVDFSD 

1090 1100 1110 1120 1130 1140 

VQvNDa'IVAP pcyqafychg dcpfpladhl nstnhaivqt lvnsvnssip kaccvftels 

1150 1160 1170 .1180 1190 1200 

ATSHLYLDEY DXWLKNYQE MWEGGGCR* 
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1270 1280 1290 1300 • 1310 1320 

TCAGGCTGGT GTGATGGGAT TCCCTGCACC TAAAGGTGCT GCTGGAGAGC CCGGCAAGGC 

1330 1340 1350 1360 1370 " 8t> 

TGCAGAGCCA GGTGTTCCCG GACCCCCTGC CGCTCTCGCT CCTGCTGGCA AACATGCACA 

ivin 1400 1410 1420 1430 1440 

GGCTGGAGCT CAGGGACCCC CTGGCCCTGC TGGTCXGCT GGCGAGAGAG GTGAACAAGG 

1450 1460 1470 1480 1490 1500 

CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT CCCTG:JTCCT GCTGGTCCTC CAGGTCAAGC 

1510 1520 1530 1540 1550 1560 

AGGCAAACCT GGTGAACAGG GTGTTCCTGG AGACC'fTGGC CCCCCTGGCC CCTCTGGAGC 

1570 1580 1590 1600 1610 1620 

AAGAGGCGAG AGAGGTTTCC CTGGCGAGCG TGGTGTGCAA GGTCCCCCTG GTCCTGCTGG 

1630 1640 1650 1660 1670 16B0 

ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CGATGGTGCT AAGGGTGATG CTGGTGCCCC 

1690 ' 1700 1710 1720 1730 1740 

TGGAGCTCCC GGTAGCCAGG GCGCCCCTGG CCTTCXGGGA ATGCCTGGTG AACGTGGTGC 

1750 1760 1770 1780 1790 1800 

AGCTGGTCTT CCAGGGCCTA AGGGTGACAG AGGTGATGCT GGTCCCAAAG GTGCTGATGG 

1810 1820 1830 1840 1850 1860 

CTCTCCTGGC AAAGATGGCG TCCGTGGTCT GhCCC-XCCC ATTGGTCCTC CTGGCCCTGC 

1870 1880 1890 1900 1910 1920 

TGGTGCCCCT GCTGACAAGG GTGAAAGTGG TCCCA3CGGC CCTGCTGGTC CCACTGGAGC 

1930 1940 1950 1960 1970 1980 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TCGTCCCCCC GGCCCTGCTG GCTTTGCTGG 

1990 2000 2010 2020 2030 2040 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC TAAAGGCGAA CCTCGTGATG CTGGTGCCAA 

2050 2060 2070 2080 2090 2100 

AGGCGATGCT GGTCCCCCTG GGCCTGCCGG ACCCGCTGGA CCCCCTGGCC CCATTGGTAA 

2110 2120 2130 2140 2150 2160 

TGTTGGTGCT CCTGGAGCCA AAGGTGCTCG CGGCAGCGCr GGTCCCCCTG GTGCTACTGG 

2170 2180 2190 2200 2210 2220 

TTTCCCTGGT GCTGCTGGCC GAGTCGGTCC TCCTGGCCCC TCTGGAAATG CTGGACCCCC 

2230 '2240 2250 - 2260 2270 2280 

TGGCCCTCCT GGTCCTGCTG GCAAAGAAGC CGGCAAAGCT CCCCGTGGTG AGACTGCCCC 

2290 2300 2310 2.320 2330 2340 

TGCTGGACCT CCTGGTGAAG TTGGTCCCCC TGCTCCCCCT GGCCCTGCTG GCGAGAAAGG 

2350 2360 2370 2380 2390 2400 

ATCCCCTGGT GCTGATGGTC CTGCTGGTGC TCCTGGTACT CCCGGCCCTC AAGGTATTGC 

2410 2420 2430 2440 2450 2460 

TGGACAGCCT GGTGTGGTCC GCCTCCCTGG TCAGAGAGGA GAGAGAGGCT TCCCTGCTCT 

2470 2480 2490 2500 2510 2520 

TCCTGGCCCC TCTGGTCAAC CTGGCAAACA AGCTCCCTCT GGAGCAAGTG GTCAACCTGG 
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2530 2540 2S50 2560 • 2570 2580 

TCCCCCCGGT CCCATGGGCC CCCCTCGATT GGCTGGACCC CCTGCTGAAT CTGGACGTGA 

2590 2600 2610 2620 . 2630 2640 

GGGGGCTCCT GCTGCCGAAG CTTCCCCTCC ACGAGACGGT TCTCCTGGCG CCAAGGGTGA 

2650 2660 2670 2680 2690 2700 

CCGTGGTGAG ACCGGCCCCG CTGGACCCCC TGGTGCTCOT GGTGCTCNTC CTGCCCCTGG 

2710 2720 2730 2740 2750 2760 

CCCCGTTGGC CCTGCTGGCA AGACTGGTGA TCGTGGTGAG ACTGGTCCTG CTGGTCCCGC 

2770 2780 27S0 2800 2810 2820 

CGGTCCCCTC GGCCCCGCTG GCGCCCCTGG CCCCCCCGGA CCCCAACGCC CCCGTGGTGA 

2830 ' 2840 2850 2860 2870 2880 

CAAGGGTGAG ACAGGCGAAC AGCGCGACAG AGGCATAAAG GGTCACCGTG GCTTCTCTGG 

2890 2900 2910 2920 2930 2940 

CCTCCAGGGT CCCCCTGGCC CTCCTGGCTC TCCTGCTGAA CAAGGTCCCT CTGGAGCCTC 

2950 . 2960 2970 2980 2990 3000 

TGGTCCTGCT GGTCCCCGAG GTCCCCCTGG CTCTGCTGGT GCTCCTGGCA AAGATGGACT 

3010 3020 3030 3040 3050 3060 

CAACGGTCTC CCTGGCCCCA TTGGGCCCCC TGGTCCTCGC GGTCGCACTC GTGATGCTGG 

3070 3080 3090 3100 3110 3120 

TCCTGTTGGT CCCCCCGGCC CTCCTGGACC TCCTGGTCCC CCTGGTCCTC CCAGCGCTGG 

3130 3140 3150 3160 3170 3180 

TTTCCACTTC AGCTTCCTCC CCCAGCCACC TCAAGAGAAG GCTCACGATG GTGGCCGCTA 

3190 3200 3210 3220 3230 3240 

CTACCGGGCT agatccCAGC GGGCCAGGAA GAAGAATAAG AACTGCCGGC GCCACTCGCT 

3250 3260 3270 3280 3250 3300 

CTATGTGGAC TTCAGCGATG TGGGCTGGAA TGACTGGATT GTGGCCCCAC CAGGCTACCA 

3310 3320 3330 3340 3350 3360 - 

GGCCTTCTAC TGCCATGGGG ACTGCCCCTT TCCACTGGCT CACCACCTCA ACTGAACCAA 

3370 3380 3390 3400 3410 3420 

CCATGCCATT CTGCAGACCC TGGTCAATTC TGTCAATTCC AGTATCCCCA AAGCCT57TG 

3430 3440 3450 3460 3470 3480 

TGTGCCCACT GAACTGACTC CCATCTCCAT CCTGTACCTG GATGAGTATG ATAAGGTGGT 

3490 3S00 3510 3520 3530 . 3540 
ACTGAAAAAT TATCAGGAGA TCGTAGTAGA GGCATGTGGG TGCCGCTAAa agctt 
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20 30 40 . 50 60 

q^OeS TGGISVPGPM GPSGPRGLPC PPGAFGPQGF QGPFCEFGEP GASGPMGPRG 

1Q BO 90 100 no 120 



PPGPPGKNGD DGEAGKPGRP GERGPPGPQG ARGLPGTACLi PGMX.GHHGFS GLDGAKCWG 

140 150 160 170 180 

PAGPXGEPGS PGQJGAFGQH GFRCLPGERG RPGAPGPAGA RGNDGATGAA CPFGPTCPAG 

?00 210 220 230 240 

PPGFPGAVGA KGEAGPGCPR GSEGPSGVRG- EPGPPGPAGA AGPAGNPGAD GQPGAKGANG 

,50 260 270 280 290 300 

APGIAGAPGF PGARGPSGPQ GPGGPPGPKG HSGEPCAPGS XGDTGAKGEP GPVGVQGPPG 

, 10 320 330 340 350 360 

PAGEEGXRGA RGEFGPTGLP GPPGE3GCPC rRGFPGADGV AGPXGPAGER GSPGPAGPXG 

370 380 390 400 410 420 

SPGEAGBPGE AGLPGAXGLT GSPGSPGPDG KIGPPGPAGQ DGRPCPPCPF GARGQAGVMG 

430 440 450 460 470 480 

FPGPKGAAGE PGKAGERGVP GPPGAVGPAG KDGEAGAQGP PG PAG PAGER GSQGPAGSPG 

490 500 510 520 530 540 

FQGLPGPAG? PGEAGKPGEQ GVFGDLGAPG PSGARGERGF PGERGVCGPP GPAGPRGANG 

550 560 570 580 590 600 

APGNEGAXGD. AGAPGAPGSQ GAPGLQGMFG ERGAAGLPGP KGDRGDAGPK GAEG5PGXDG 

610 620 . 630 640 650 660 

VRGLTGPIGP FGPAGAPGDK GESGPSGPAG PTCARGAPGD RGEPGPPGPA GFAGPPCADG 

670 680 690 700 710 720 

QPGAXGEFGD AGAXGQAGPP GPAGPAGPPG PIGNVGAPCA XGARGSAGP? GATGFPGAAG 

730 740 750 760 770 780 

RVGPPGPSGN AGPPGPPGPA GKEGGKGPHG ETGPAGRFCE VGPPGPPGPA GEXGSPGADG 

790 800 810 820 830 840 

PAGAPGTPGP QGIAGORGVV GLPGQRGERG FPGLFGPSCS PGKQGPSGAS GERGPPGFKG 

850 860 870 880 890 900 

PPGLAGPPGE SGREGAPAAE GSPGRDGSPG AKGDRGETG? AGPPGAXGAX GAPGFVGPAG 

910 920 930 940 950 960 

KSGDRGETGP AGPAGFVGPA GARGPAGPQG PRGDKGETGE QGDRGIKGKH GFSGLCGPPG 

970 980 950 1000.- 1010 1020 

PPGSPGEQGP SGASGPAGPR GPPGSAGAPG KDGENGLPGP IGPPGPRGRT GDAGPVGPPG 

1030 1040 1050 1060 1070 1020 

PPGPPGPPGP PSAGFDFSFL PQPPQEXAHD GGRYYRARSA LOTWCFSST EXMCCVRQLY 

1090 1100 1110 1120 1130 1140 

IDFRXDLGWX WIHEPXGYHA NFCLCPCPYI WSU7JQYSKV LALYNQHNPG ASAAPCCVPQ 

1150 1160 1170 1180 1190 1200 
ALEPLPrVYY VGRXPKVEQL SNMTVRSCKC S* 
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10 20 30 40 50 60 

gggaaggatt tccatttccC AGCTGTCTTA TGGCTATGAT GAGAAATCAA CCGGAGCAAT 

nn 80 90 100 HO 120 

TTCCGTGCCT GGCCCCATGG CTCCCTCTCC TCCTCGTGGT CTCCCTCGCC CCCCTGCTGC 

un 140 . 150 160 no 180 

ACCTGGTCc2 CAAGGCTTCC AAGCTCCCCC TGCTGrvGCCT GGCGAGCCTG WGCTTCACC 

200 210 " 220 230 240 

TCCCATCGGT CCCCGAGGTC CCCCAGCTCC CCCTG:2AAAG AATGGAGATG ATGGGGAAGC 

,r 0 260 270 280 250 300 

TCGAAAACCT GGTCGTCCTG GTGAGCGTGG GCCTCCTGGG CCTCACGGTG CTCGAGGATT 

, 10 320 330 340 350 360 

GCCCGGAACA GCTGGCCTCC CTGGAATGAA GGGACACA6A GSTTTCAGTG GTTTGGATCG 

370 380 390 400 410 420 

TGCCAAGGCA GATGCTCGTC CTGCTGCTCC TAAGGSTGAC CCTGGCAGCC CTCCTGAAAA 

430 440 450 460 470 4B0 

TGGAGCTCCT GGTCAGATGG GCCCCCCTGG CCTGCCTGGT GAGACAGGTC GCCCTGGAGC 

49 0 500 510 520 530 540 

CCCTGGCCCT GCTGGTGCTC GTGGAAATGA TGCTGCTACT GGTGCTGCCG GGCGCCCTGG 

550 560 ' 570 580 590 600 

TCCCACCGGC CCCGCTGGTC CTCCTGGCTT CCCTCXrrGCT GTTGGTGCTA AGGGTGAAGC 

610 620 630 640 650 660 

TGGTCCCCAA GCGCCCCGAG CCTCTGAAGG TCCCCAGGGT GTGCGTGGTG AGCCTGGCCC 

670 680 690 700 710 720 

CCCTGGCCCT GCTGGTGCTG CTGGCCCTGC TGGAAACCCT GGTGCTGATG GACAGCCTGG 

730 740 750 760 770 780 

TGCTAAAGGT GCCAATGGTG CTCCTGGTAT TGCTGGTGCT CCTGGCTTCC CTGGTGCCCG 

790 800 810 B20 830 840 

AGGCCCCTCT GGACCCCAGC GCCCCGGCGG CCCTCCTGGT CCCAAGGGTA ACAGCGGTGA 

850 860 870 880 890 900 

ACCTGGTGCT CCTGGCAGCA AAGGAGACAC TGGTGCTAAG GGAGAGCCTG GCCCTGTTGG 

910 920 930 940 950 960 

TCTTCAAGGA CCCCCTGGCC CTGCTGGAGA GGAAGCAAAG CGAGGAGCTC GACGTGAACC 

970 .980' 990 1000. 1010 ' 1020 

CGGACCCACT GGCCTGCCCG GCCCCCCTGG CGAGCGTGGT GGACCTGGTA GCCGTGGTTT 

1030 1040 1050 1060 1070 1080 

CCCTGGCGCA GATGCTGTTC CTGGTCCCAA GCGTCCCGCT CCTGAACGTG GTTCTCCTGG 

1090 1100 1110 1120 1130 1140 

CCCCCCTGCC CCCAAAGGAT CTCCTGGTCA AGCIGGTCGT CCCGCTCAAG CTCGTCTCCC 

1150 1160 1170 1180 1190 1200 

TCGTGCCAAG GGTCTGACTG GAACCCCTGG Cf.GCCCTGCT CCTCATGCCA AAACTGCCCC 

1210 1220 1230 1240 1250 1260 

CCCTGGTCCC CCCGCTCAAG ATGGTCCCCC CGGACCCCCA GGCCCACCTG GTGCCCGTGG 



FIG. I6A 



171 



EP 0 992 586 A2 



1270 



1280 



1290 



1300 
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TCAGGCTGGT GTGATGGGAT TCCCTGGACC TAAAGGTGCT GCTGGAGAGC CCGGCAAGCC 

1330 1340 1350 1360 1370 1380 

TGGAGAGCGA GGTGTTCCCG GACCCCCTCG CGCTCTCGGT CCTCCTGCCA AAGATGGAGA 

1390 1400 1410 1420 1430 1440 

GGCTGGACCT CAGGGACCCC CTGGCCCTCC TGGTCCCCCT CGCCAGACAG CTCAACAAGG 

1450 1460 1470 1480 1490 1500 

CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT CCCTGGTCCT GCTGGTCCTC CAGGTGAAGC 

1510 1520 1530 1S40 1550 1560 

AGGCAAACCT GGTGAACAGG GTGTTCCTGG AGACC7TGGC GCCCCTGGCC CCTCTGGAGC 

1570 , 1580 1550 1600 1610 1620 

AAGAGGCGAG AGAGGTTTCC CTGGCGAGCG TGGTGTGCAA GGTCCCCCTG GTCCTGCTGG 

1630 1640 16S0 1660 1670 1680 

ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CGATCCTGCT AAGGCTCATG CTGGTGCCCC 

1690 1700 1710 1720 1730 1740 

TGGAGCTCCC GGTAGCCAGG GCCCCCCTGG CCTTCAGGCA ATGCCTGGTG AACCTGGTGC 

1750 1760 1770 17B0 1790 1800 

AGCTGGTCTT CCAGGGCCTA AGCGTCACAG AGCTGJiTGCT GGTCCCAAAG GTCCTGATCG 

1810 1820 1830 1840 1850 I860 

CTCTCCTGGC AAAGATGGCG TCCGTGGTCT GACCC.XXCC ATTGGTCCTC CTGGCCCTGC 

1870 1880 • 1890 1900 1910 1920 

TGGTCCCCCT GCTGACAAGG GTGAAAGTGG TCCCAGCGGC CCTGCTGGTC CCACTGGAGC 

1930 1940 1950 1960 1970 1980 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTG GCTTTGCTCG 

1990 2000 2010 2020 2030 2040 

CCCCCCTGGT GCTGACGGCC AACCTGCTGC TAAACGCGAA CCTGGTGATG CTGGTGCCAA 

2050 2060 2070 2080 2090 2100 • 

AGGCGATGCT GGTCCCCCTG GGCCTGCCGG ACCCGCTGCA CCCCCTGGCC CCATTGGTAA 

2110 2120 2130 2140 2150 2160 

TCTTGGTGCT CCTGGAGCCA AAGGTGCTCG CGGCAIJCGCT GGTCCCCCTG GTGCTACTGG} 

2170 2180 2190 2200 2210 2220 

TTTCCCTGGT GCTGCTGGCC GAGTCGGTCC TCCTOGCCCC TCTCGAAATG CTGGACCCCC 

2230 2240 2250 2260 2270 2280 

TCGCCCTCCT GGTCCTCCTG .GCAAAGAAGG CGGCAAAGGT CCCCGTCGTC AGACTGCCCC 

2290 2300 2310 2320 2330 2340 

TGCTGGACGT CCTGGTGAAG TTGGTCCCCC TGGTCCCCCT GGCCCTGCTG GCGAGAAAGG 

2350 2360 2370 2380 2390 2400 

ATCCCCTGGT GCTGATGGTC CTGCTC-GTGC TCCTGCTACT CCCGGGCCTC AAGGTATTGC 

2410 2420 2430 2440 2450 2460 

TCSACAGCCT CGTGTGGTCG GCCTGCCTCC TCAGASAGGA GAGAGAGGCT TCCCTGGTCT 

2470 2480 2490 2500 2510 2520 
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TtccccS? cccatgggcc cccereSS ggogcaccc crionSS ciocmcSS 

,..„ , fil0 2620 2630 2640 

oococc^ ocroccS gt^ccS acgaga^ggt itccgccg ccaagcg™ 

,670 2660 2690 2700 

ccgtggtgag accggccccg cggaccccc r^crccr ggtgcic** gtccccctgo 

7720 2730 2740 2750 2760 

CCCCGT4S CCTCCTGGCA AGACTGG7GA TCGTGGTGAG ACTGCTCCTG CTGGTCCCGC 

7780 2790 2800 2810 2820 

CGGTCCcSc CGCCCcLw GCGCCCGTCG CCCCGCCGGA CCCCA*GGCC CCCGTGGTGA 

2830 " 2840 2850 2860 2870 2880 

CAAGGCTCAG ACAGGCGAAC AGGCCCACAG AGGCA7AAAG GGTCACCGTG GCTTCTCTCG 

2890 2300 2910 2920 2930 2940 

CCTCCAGGGT CCCCCTGGCC CTCCTGGCTC TCCTCGTGAA CAAGGTCCCT CTGGAGCCTC 

2950 2960 2970 2980 2990 3000 

TGGTCCTGCT GGTCCCCGAG GTCCCCCTGG CTCTGCTGGT GCTCCTGGCA AAGATGGACT 

3010 3020 3030 3040 3050 3060 

CAACGGTCTC CCTGGCCCCA TTGGGCCCCC TGGTCCTCGC GGTCGCACTG GTGATGCTGG 

3070 3080 3090 3100 3110 3120 

TCCTGTTGGT CCCCCCGGCC CTCCTGGACC TCCTGOTCCC CCTGGTCCTC CCAGCGCTGG 

3130 3KQ 3150 3160 3170 3180 

TTTCGACTTC AGCTTCCTCC CCCAGCCACC TCAAGAGAAG GCTCACGATG GTGGCCGCTA 

3190 3200 3210 3220 3230 3240 

CTACCGGGCT agatCCGCCC TGGACACCAA CTATTOCTTC AGCTCCACGG AGAAGAACTG 

3250 3260 3270 3280 3290 3300 

CTGCGTGCGG CAGCTGTACA TTGACTTCCG CAAGGACCTC GGCTGGAAGT GGATCCACGA 

3310 3320 3330 3340 3350 3360 

CCCCAAGGGC TACCATGCCA ACTTCTGCCT CGGGCCCTGC CCCTACA7TT GGAGCCTGGA 

3370 3380 3390 3400 3410 3420 

CACGCAGTAC AGCAAGGTCC TGGCCCTGTA CAACCAGCAT AACCCGGGCG CCTCGGCGGC 

3430 3440 3450 3460 3470 3480 

GCCGTGCTGC CTGCCGCAGG CGCTGGAGCC GCTGCCCATC GTGTACTACG TGGGCCGCAA 

3490 3500 3510 3520 " 3530 3540 

GCCCAAGGTG GAGCACCTGT CCAACATGAT CGTGCGCTCC TGCAACTGCA CCTGAtctag 

3550 3560 3570 35B0 3590 3600 
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10 20 30 40 - 50 60 

OLSYGYDEXS TGCISVPCPM GPSGPRGLPG PPGAPGPCCF QGPFCEPCEP GASGPMGPRG 

70 60 90 100 HO 120 

PPGPPGKNGD DCEAGKPCRP GERGPPGPCG ARCLFGTACL PGMKCHRGFS CLCCAXCQAG 

^0 140 ISO . 160 no 180 

PAGPKGEPGS PCENGAPGQH CPRGLPGERG RPGAPGPAGA RGHDGATCAA GPPGPTGPAC 

190 200 210 220 230 2<0 

PPGr PGAVGA XGEAGPQGPR CSECPQGVRG EPGPPGPACA ACPAGNPGAD GQPGAXGANG 

250 260 270 2B0 290 300 

APC1ACAPGF PGXRCPSGPQ GPGGPPGFXG NSGEPGAPGS XCDTGAXGEP GPVGVCGPPG 

310 320 330 340 350 360 

PAGEEGXRGA RGEPGPTGLP GPPoERGGPG SRGFPCADGV AGPKGPAGER GSPGPAGPKG 

370 380 390 400 410 420 

SPCEAGRPGS AGL.PGAXGLT GSPGSPGPDG XTGPPGPAGQ DGRPGPPGPP GARGOAGVM3 

430 440 450 460 470 480 

FPGPKGAAGE PGKAGERGVP GPPGAVGPAG XD3EAGAQG? PC PAG PAGER CEQGPAGSPG 

490 500 510 520 530 540 

FQGLPGPAGP PGEAGXPGEQ GVPGDLGAPG PSCARGERGF PGERGVQGPP GPAGPRGANG 

550 560 570 580 590 600 

APGNDGAXGD AGAPGAPGSQ GAPGLQGMPG ERGAAGLPGP KGDRGDAGPK GADGSPGKDG 

S10 620 630 6<0 650 660 

VRGLTCPIGP PCPAGAPGDX GESGPSGPAG PTGARGAPGD RGEPGPPGPA GFAGPPGAD3 

070 680 690 700 710 720 

QPGAXGEPGD AGAKGDAGPP GPAGPAGPPG PIGNVJ3APGA KGARGSAGPP GATGFPGAAG 

730 740 750 760 770 780 

RVGPPGPSGN AGPPGPPGPA GKEGGXGPRG ETGPAGRPGE VGPPGPPGPA GEKCSPGADG 

790 800 810 820 830 ' 840 

PAGAPGTPGP QGIAGQRGW GLPGQRGERG FPGLPGPSGS PGKQGPSGAS GERGPPGPMG 

850 660 870 880 890 900 

PPGLAGPPGE SGREGAPAAE CSPGRECSPG HXGDRCETG? AGPPGAXGAX GAPGPVCPAC 

910 920. 930 940 950 960 

KSGDRGETGP AGPAGPVGPA GARGPAGPQG PRGDKGSTGE QGDRGIKGKR GFSGLQGPFG 

■ 970 550 990 1000 1010 1020 

PPGSPGEQGP SGASGPAGPR GPPGSAGAPC KDGU5GWGP IGPPGPRGRT CDACPVGPPG 

1030 10<0 10S0 106D 1070 1080 

PPGPPCPPGP PSAGFDFSFL PQPPQEKAKD GCRYYRARSD EASGIGPEVP DDRDrEPSLG 

1090 1100 1110 1120 1130 1140 

PVCPFRCQCH LRWQCSDLG LDaVPXDLP? DTTLLDLQWN" XITEIKDGDF KNLKNLHALI 

1150 1160 1170 1180 1190 3200 

LV>i>JKISXVS PGAPTPLVXL ERLYLSKNQI- KELPEXMPXT LQELRAHENE 1TKVRKVTFN 

1210 1220 1230 1240 1250 1260 

CLKQMTVIEL GTNPLXSSCI ENCAFQGWXX LSYIR1ADTN ITSIPQC-LPP SLTEUHL-DGN 
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1270 1280 1290 1300 . 1310 1320 

KISRVDAASL KGLMNtAXLC LSFNSISAVD NCSLWJTPHL RELHLOOflO, TRVPGGLAXH 

1330 1340 1350 1360 1370 1380 

XYIQWYLHN KNISWGSSD FCPPGKNTKK ASYSGVSLfS NPVQYWEIQP STFRCVYVRS 

1390 1400 1410 1420 1430 1440 
AIQLGNYX- 
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CM CK TCr TAT GGC TAT GAT a* TC'n ACC CCA GGA ATT TCC GTG CCT GGC CCC ATG 
63 



GCT CCC TCP GGT CCT cS GGT CTC CCT GGC CCC CCT GGT GCA CCT GGT CCC CAA GGC TTC 

138 !47 ' 156 16S 174 

CAA GGT CCC CCT GGT GAG CCT GGC GAG CCP GGA GCT TCA GGT CCC ATG GGT CCC CGA GGT 

1Rq 198 207 216 225 234 

CCC CCA GGT CCC CCT GGA AAG AAT GGA GST GAT GGG GAA GCT GGA AAA CCT GGT CGT CCT 

249 253 267 276 235 294 

GGT GAG CCT GGG CCT CCf GCG CCT CAG GOT. GCT GGA TTG CCC GGA ACA GCT GGC CTC 

309 318 327 336 345 354 

CCT GGA ATG AAG GGA CAC ACA GGT TTC AST GGT TTG GAT GGT CCC AAG GGA GAT GCT GGT 

369 373 387 396 405 414 

CCT GCT GGT CCT AAG GGT GAG CCT CGC AGC CCT GGT GAA AAT GGA. GCT CCT GGT CAG ATG 

4 og 438 447 456 465 474 

GGC CCC CGT OGC CTC CCT GGT GAG AGA GGT CGC CCT GGA GCC CCT GGC CCT GCT GGT GCT 

489 498 507 515 525 534 

CCT GGA AAT GAT GCT GCT ACT GO? GOT GCC GGG CCC CCT GGT CCC ACC GGC CCC GCT GGT 

549 558 567 576 585 S94 

CCT CCT GCC TTC CCT GGT GCT GTT GGT GCT AAG GGT GAA GCT GGT CCC CAA GGG CCC CGA 

509 618 627 636 645 654 

GGC TCT GAA GGT CCC CAG GGT GTG CGT GGT GAG CCT GGC CCC CCT GGC CCT GCT GGT GCT 

669 678 637 695 705 714 

GCT GGC CCT GCT GGA AAG CCT GGT GCT GAT GGA CAG CCT GGT GCT AAA GGT GCC AAT GGT 

729 733 747 756 765 774 

GCT CCT GGT ATT GCT CGT GCT CCT GGC TTC CCT GGT GCC CGA GGC CCC TCT GGA CCC CAG 

785 798 807 616 S25 834 

GGC CCC GGC GGC CCT CCT GGT CCC AAG GGT AAC ACC GGT GAA CCT GGT GCT CCT CGC AGC 

£49 858 867 876 885 894 

AAA OGA GAC ACT GGT CCT, AAG GGA GAG CCT GGC CCT GTT GGT GTT CAA GGA CCC CCT GGC 

90S 913 927 936 945 954 

CCT GCT GGA GAG CAA GGA AAG CGA GGA GCT CGA GGT GAA CCC GGA CCC ACT GGC CTG CCC 

969 978 ' 937 996 ao05 1014 

GGA CCC CCT GGC GAG CGT GGT CGA CCT GGT AGC CGT GGT TTC CCT GGC GCA GAT GGT GTT 

1029 1038 1047 1056 1065 1074 

GCT GGT CCC AAG CGT CCC GCT GGT GAA CGT GGT TCT CCT GGC CCC GCT GGC CCC AAA GGA 

1039 1 098 1107 1116 n 2 5 >i 34 

TCT CO? CGT CAA GCT GGT CGT CCC GGT GAA GCT GGT CTG CCT GGT GCC AAg'gGT CTC ACT 

"49 1*58 1167 ai76 US5 1194 

CGA AGC OCT GGC ACC. CCT GGT CCT GAT GGC AAA ACT GGC CCC CCT GGT CCC GCC GGT CAA 

1209 1218 1227 1236 1245 1254 

GAT OST CCC CCC CGA CCC CCA GGC CCA CCT GGT GCC CGT GGT CAG GCT GGT GTG ATG GGA 
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126? 1272 -.257 129-5 3 ->0f, 1314 

I1C CCT CGA IVT AAA CGT GOT GCT CGA" GAG CCC CGC AAC GCT GGA GAG CGA GGT CTT CCC 

\2.20 133S 1 j4 7 135o 1365 1374 

gga ccc ccr ax (xrr crc gg? CCT CCT GGC AAA GAT CCA GAG GCT CGA GCT CAG GGA CCC 

1339 1393 14C7 1416 1425 1434 

CCT GGC CCT GCT GGT CCC" GCT GGC GAG AGA GGT GAA CAA GGC CCT GCT GGC TCC CCC GGA 

1449 14S3 1467 1476 1485 1494 

TTC CAG GGT CTC CCT GCT CCT GCT GGT CCT CCA GGT GAA GCA GGC AAA CCT GGT GAA CAG 

1509 1518 1527 1536 1545 1554 

GGT CTT CCT GGA GAC CTT GGC GCC CCT GGC CCC TCT GGA GCA AG*, GGC GAG AGA GGT TTC 

1569 1S78 1537 1SS6 1605 1614 

CCT GGC GAC CGT GGT GTG CM GGT CCC CCT GGT CCT OCT GGA CCC CGA GGG GCC AAC GGT 

1629 1638 1647 1656 1665 1674 

GCT CCC GCC AAC GAT GGT GCT AAC CGT GAT GCT GGT GCC CCT GGA GCT CCC GGT AGC CAG 

1689 1698 1707 1716 1725 1734 

GGC GCC CCT GGC CTT CAG GGA ATG CCT GGT CAA CGT GGT GCA GCT GGT CTT CCA GGG CCT 

1749 17S8 1767 1776 1785 1794 

AAG GGT GAC AGA GGT GAT GCT CGT CCC AAA GGT GCT GAT GGC TCT CCT GGC AAA GAT GGC 

1309 1818 1827 1536 1845 1854 

GTG CGI' GGT CTG ACC GGC CCC ATT GGT CCT CCT GGC CCT GCT GGT GCC CCT GGT GAC AAG 

13S9 13/8 1S87 1S96 1905 1914 

GGT GAA AGT GGT CCC AGC GGC CCT GCT GGT CCC ACT GGA GCT CGT GGT GCC CCC GGA GAC 

1929 1S38 1947 1956 1965 1974 

CGT GGT GAG CCT GGT CCC CCC GGC CCT GCT GGC TTT GCT GGC CCC CCT GGT GCT GAC GGC 

1989 1993 2007 2016 2025 2034 

CAA CCT CGT GCT AAA GGC GAA CCT GGT GAT CCT GGT GCC AAA GGC GAT GCT GGT CCC CCT 

2049 2058 2067 2075 20S5 '094 

GGG CCT GCC GGA CCC GCT GGA CCC CCT GGC CCC ATT GGT AAT GTT GGT GCT CCT GGA GCC 

2109 2U8 2127 2136 2145 21S4 

AAA GST GCT CGC CGC AGC GCT GGT CCC CCT GGT GCT ACT GGT TTC CCT GGT GCT GCT GGC 

2169 2178 2187 2196 2205 2214 

CGA GTC GGT CCT CCT CGC CCC TCT GGA AAT GCT GGA CCC CCT GGC CCT CCT GGT CCT GCT 

2229 2233 • 2247 2256 2265 2274 

GGC AAA GAA GGC GGC AAA GGT CCC CGT GGT GAG ACT GGC CCT GCT GGA CGT CCT GGT GAA 

2289 2298 2307 2316 2325 2334 

GTT COT CCC CCV CGT CCC CCT GGC CCT GOT GGC GAG AAA GGA TCC CCT GGT GCT GAT GGT 

2349 2.-153 2367 2376 2355 2 394 

CCr CCT GGT CCT CCT RST ACT CCC CGG CCT CAA GGT ATT GCT GGA CAG CGT GGT GTG GTC 

->•">? 2413 ' 2427 2436 2445 2454 

GoC CTG CCT CGT CAG AGA GGA GAG AGA GGC TTC CCT CGT CTT OCT GGC CCC TCT GGT GAA 

2,169 247S 2487 2496 2505 2514 

ccr ccc aaa caa ggt ccc tct gga gca agt ggt gaa cgt ggt ccc ccc ggt ccc atg ggc 
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2S29 2538 2547 25S6 2 J0 5 2574 

CCC CCf CGA VIC OCT CCA CCC CCT GCP GM TCT GGA CGT GAG GGG GCT CCT GCT GCC GAA 

2589 2598 2607 2616 2625 2634 

GGT TCC CCT GGA CGA GAC GGT TCT CCT GGC GCC AAG GGT GAC CGT GGT GAG ACC GGC CCC 

2649 265? 2667 267S 2605 2654 

GCT GGA CCC CCT GGT C-TT CCT GGT CCT CCT GGT GCC CCT GGC CCC GTT GCC CCT GCT GGC 

2709 27X3 2727 2736 2745 2754 

AAG AC" CGT CAT CGT GOT GAG ACT CGT CCT GCT' GGT CCC GCC GGT CCC GTC GGC CCC GCT 

2769 2778 27S7 2796 2S05 2814 

GGC GCC CGT CGC CCC GCC CGA CCC CAA GGC CCC CGT GGT GAC AAG GGT GAG ACA GGC GAA 

2829 2838 2847 2356 2865 2874 

CAG GGC GAC AGA GGC ATA AAG GGT CAC CGT GGC TTC TCT GGC CTC CAG GGT CCC CCT GGC 

2889 2898 2907 2916 2925 2934 

CCT CCr GGC TCT CCT GGT GAA CAA CGT CCC TCT GGA GCC TCT GGT CCT GCT GGT CCC CGA 

2949 2953 2967 2976 2985 2994 

GGT CCC CCT GGC TCT GCT GGT GCT CCT GGC AAA GAT GGA CTC AAC GGT CTC CCT GGC CCC 

3009 3018 3027 3036 3045 3054 

ATT GGG CCC CCT CGT CCT CGC GGT CGC ACT GGT GAT GCT GGT OCT GTT GGT CCC CCC GGC 

3069 307S 3087 3095 3105 3114 

CCT CCT CGA CCT CCT GGT CCC CCT GGT CCT CCC AGC GCT GGT TTC GAC TTC AGO TTC CTC 

3129 3138 3147 3156 3165 3174 

CCC CAG CCA CCT CAA GAG AAG GCT CAC GAT GGT GGC CGC TAC TAC CGG GCT AGA TCC GAT 

3169 31*3 2207 3216 3225 3234 

GAG GCT TCT GGG ATA GCC CCA, CAA GTT CCT GAT GAC CGC GAC TTC GAG CCC TCC CTA.GGC 

3249 32S3 3267 3276 3285 3294 

CCA GTG TGC CCC TTC CGC TCT CAA TCC CAT CTT CGA GTG GTC CAG TGT TCT GAT TTC GGT 

3309 3318 3327 3336 3345 3354 

CTG GAC AAA GTG CCA AAG GAT CTT CCC CCT GAC ACA ACT CTC CTA GAC CTG CAA AAC AAC 

3369 3378 3387 3396 3405 3414 

AAA ATA ACC GAA ATC AAA GAT GGA GAC TTT AAG AAC CTG AAG AAC CTT CAC CCA TTC ATT 

3429 3438 3447 34S6 3465 3474 

CTT GTC AAC AAT AAA ATT AGC AAA GTT AGT CCT GGA GCA TTT ACA CCT TTG GTG AAG TTC 

3489 345S - 3507 3516 3525 3534 

CAA CGA CTT TAT CTG TCC AAG AAT CAG CTG AAG G AH TTG CCA GAA AAA ATG CCC AAA ACT 

3549 J553 3567 3S76 3585 3594 

CTT CAG GAG CTG CGT GCC CA? GAG AAT GAG ATC ACC AAA GTG CGA AAA GTT ACT TTC AAT 

3609 3618 3627 3636 3645 3654 

GGA CTG AAC CAG ATG ATT GTC ATA GAA CTG GGC ACC AAT CCG CTG AAG AGC TCA GGA ATT 

36-69 3678 ' 3637 3696 3705 3714 

GAA AAT GGG CCT TTC CAG CCA ATG AAG AAG CTC TCC TAC ATC CGC ATT CCT GAT ACC AAT 

"29 3735 37-17 3755 3765 3774 

ATC ACC AGC ATT CCT CAA GCF CTT CCT CCT TCC CTT ACC GAA TTA CAT CTT GAT GGC AAC 
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10 20 30. 40 .50 .60 

gggaaggatc tccattcccC agctgtctta tgoctatgat CAGAaatcaa ccggagCaat 

70 80 90 100 110 120 

TTCCGTGCCT GGCCCCATGG GTCCCTCTGG TCCTCGTGGT CTCCCTGGCC CCCCTOGTOC 

ACCTCGTCCC CAAGGCT7CC AAGGTCCCCC TGGTGAGCCT CGCGAGCCTG CACCTTCACC 

tcccatgSt ccccgagotc" ccccacgtcc ccctccaaa? aa-t^catc" atggggaagc 

TGGAAAACCT GGTCGTCCTG GTGAGCGTGG GCCTCCTCGG CCTCAGcSS CTCGAGGATT 

■»m 320 330 340 350 360 

GCCCGCWCA GCTGGCCTCC CTGGAATGAA GGGACACAGA GGTTTCAGTC CnTTCGATGG 

370 380 390 400 410 «0 

TGCCAAGGGA GATGCTGGTC CTGCTGGTCC TAAGGGTGAG CCTCGCAGCC CTGGTGAAAA 

430 440 450 460 470 480 

TCGAGCTCCT GGTCAGATCG GCCCCCCTGG CCTGCCTGGT GAGAGAGGTC GCCCTGGAGC 

490 500 510 520 530 540 

CCCTGGCCCT GCTGGTGCTC GTGGAAATCA TGGTGCTACT GGTGCTGCCG GGCCCCCTGG 

550 560 570 580 590 600 

TCCCACCGGC CCCGCTGGTC CTCCTGGCTT CCCTGGTGCT CTTGGTCCTA AGGGTGAAGC 

610 620 630 640 . 650 650 

TGGTCCCCAA GGGCCCCGAG GCTCTCAAGG .TCCCCAGGGT GTGCGTGGTG AGCCTGGCCC 

670 680 690 700 710 720 

CCCTGGCCCT GCTGGTGCTC CTGGCCCTGC TGGAAACCCT GGTGCTGATG GACAGCCTGG 

730 740 750 760 770 780 

TGCTAAAGGT GCCAATGGTG CTCCTGGTAT TGCTGGTGCT CCTGCCTTCC CTGGTGCCCG 

790 900 810 820 830 840 

AGGCCCCTCT GGACCCCAGG GCCCCGGCCG CCCTCCTGCT CCCAAGGGTA ACAGCGGTGA 

850 860 870 880 890 900 

ACCTGGTGCT CCTCGCAGCA AAGGAGACAC TCCTGCTAAG GGAGAGCCTG CCCCTGTTGG 

910 920 930 540 950 560 

TGTTCAAGGA CCCCCTGGCC CTCCTGGAGA GGAAGGAAAG CGAGGAGCTC GAGGTGAACC 

970 • 980 990 1000 1010 1020 

CGGACCCACT CGCCTGCCCG GACCCCCTCG CGAGCCTGGT GGACCTGGTA GCCGTGGTTT 
i» ' 
103 0 1040 1050 1060 1 070 1080 

CCCTGGCGCA GATGGTGTTG CTCGTCCCAA GGGTCCCGCT GGTGAACGTG GTTCTCCTGG 

1090 1100 1110 1120 1130 1140 

CCCCGCTGGC CCCAAAGGAT CTCCTGGTGA AGCTGGTCGT CCCGGTGAAG CTCGTCTGCC 

1150 1160 1170 1180 1190 1200 

TGGTGCCAAC GCTCTGACTC GAACCCCTGG CAGCCCTGGT CCTGATGGCA AAACTGGCCC 

1210 1220 1230 1240 1250 1260 

CCCTC-GTCCC GCCGGTCAAG ATGGTCC-CCC CGGACCCCCA GGCCCACCTG GTGCCCGTGG 
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1270 1280 1290 1300 . 1310 1320 

TCAGGCTGGT GTGATGGGAT TCCCTGGACC TAAAGCTGCT CCTCGAGAGC CCGGCAAGGC 

1330 1340 13S0 1360 1370 1380 

TGGAGAGCGA GGTGTTCCCG GACCCCCTGG COCTCTCCCT CCTGCTGGCA AAGATGGAGA 

1390 1400 1410 1420 1430 1440 

GGCTGGAGCT CAGGGACCCC CTGGCCCTGC TGCTCCCGCT GGCGAGAGAG GTGAACAACC 

1450 1460 1470 1480 1490 1500 

CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT CCCTGGTCCT CCTGGTCCTC CAGGTGAAGC 

1510 1520 1530 1540 1550 1560 

AGGCAAACCT GGTGAACAGG GTGTTCCTGG AGACCTTGGC GCCCCTGGCC CCTCTGCAGC 

1570 1580 1590 1600 1610 1620 

AAGAGCCGAC AGAGGTTTCC CTGGCGAGCG TGCTGTGCAA GGTCCCCCTG GTCCTGCTGG 

1630 1640 16S0 1660 1670 1680 

ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CCATGGTGCT AAGGGTGATC CTGGTQCCCC 

1620 1700 1710 1720 1730 1740 

TCGAGCTCCC GGTAGCCAGG GCGCCCCT6G CCTTCAGGGA ATGCCTGGTG AACGTGGTGC 

1750 1760 1770 1780 1790 1800 

AGCTGCTCTT CCAGGGCCTA AGGGTGACAG AGGTGATGCT GGTCCCAAAG GTGCTGATGG 

1810 1820 1830 1840 1850 1860 

CTCTCCTGGC AAAGATGGCG TCCGTGGTCT GACCGGCCCC ATTGGTCCTC CTGGCCCTGC 

1870 1880 1890 1900 1910 1920 

TGGTCCCCCT GGTGACAAGG GTGAAAGTGG TCCCAGCGGC CCTGCTGGTC CCACTGGAGC 

1930 1940 1950 1960 1970 1980 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTG GCTTTQCTGG 

1990 2000 2010 2020 2030 2040 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC TAAAGGCGAA CCTGGTGATG CTGGTQCCAA 

2050 2060 2070 2080 2090 2100 

AGGCGATGCT GGTCCCCCTG GGCCTGCCGG ACCCGCTGGA CCCCCTGGCC CCATTGCTAA 

2110 2120 2130 2140 2150 2160 

TGTTGGTGCT CCTGGAGCCA AAGGTGCTCG CGGCAGCGCT GGTCCCCCTG GTGCTACTCG 

2170 2180 2190 2200 2210 2220 

TTTCCCTGGT GCTGCTGGCC GAGTCCCTCC TCCTGGCCCC TCTGGAAATG CTGGACCCCC 

2230 2240 ■ 2250 2260 2270 2280 

TGGCCCTCCT GGTCCTGCTG GCAAAGAAGG CGCCAAAGGT CCCCGTGGTG AGACTGGCCC 

2290 2300 2310 2320 2330 2340 

TGCTGGACGT CCTGGTGAAG TTGCTCCCCC TGGTCCCCCT GGCCCTGCTG GCGAGAAAGC 

2350 2360 2370 ::380 2390 2400 

ATCCCCTGGT GCTGATGGTC CTGCTGCTGC TCCTGGTACT CCCGGGCCTC AAGGTATTGC 

2410 2420 2430 2440 2450 2460 

TGGACAGCGT GGTGTGGTCG GCCTGCCTGC TCAGAGACGA GAGAGAGGCT TCCCTGGTCT 

2470 2480 2490 2500 2510 2520 

TCCTGGCCCC TCTGCTCAAC CTGGCAAACA AGGTCCCTCT GGAGCAAGTG GTGAACCTGG 
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2530 2540 2550" 2560 . 2570 2580 

TCCCCCCGGT CCCATGGGCC CCCCTGGATT GGCTGGACCC CCTGCTCAAT CTGGACGTGA 

2590 2600 2610 2620 2630 2640 

GGGGGCTCCT GCTGCCGAAG GTTCCCCTGG ACGAGACGGT TCTCCTGGCG CCAAGGGTGA 

2650 2660 2670 2680 2690 2700 

CCGTGGTGAG ACCGGCCCCG CTGGACCCCC TGGTGCTCOT GGTGCTCOTC GTGCCCCTGG 

2710 2720 2730 2740 2750 2760 

CCCCCTTGGC CCTGCTGGCA AGAGTGGTGA TCGTGGTGAG ACTGGTCCTG CTGGTCCCGC 

2770 2780 2790 2800 2810 2820 

CGGTCCCCTC GCCCCCGCTG CCCCCCGTGC CCCCGCCGGA CCCCAAGCCC CCCGTGGTGA 

2830 2840 2850 2860 2870 2880 

CAAGGGTGAG ACAGGCGAAC AGGGCGACAG AGGCATAAAG GGTCACCGTG GCTTCTCTGG 

2890 2900 2910 2920 2930 2340 

CCTCCAGCCT CCCCCTGGCC CTCCTCGCTC TCCTGCTGAA CAACGTCCCT CTGGAGCCTC 

2950 2960 2970 2980 , 2990 3000 

TGGTCCTGCT GGTCCCCGAG GTCCCCCTGG CTCTGCTGGT GCTCCTGOCA AAGATGGACT 

3010 3020 3030 3040 3050 3060 

CAACGGTCTC CCTGGCCCCA TTGGGCCCCC TCGTCCTCGC GGTCGCACTG GTGATGCTCG 

3070 3080 3090 3100 3110 3120 

TCCTGTTGGT CCCCOCGGCC CTCCTGCACC TCCTGGTCCC CCTGGTCCTC CCAGCGCTGG 

^ ^ 3140 3150 • 3160 3170 3180 



TTTCGACTTC AGCTTCCTCC CCCAGCCACC TCAAGAGAAG GCTCACGATG GTGGCCGCTA 
CT^CCcS agatc S AGGATCTTCC CCCTGACACA AC^S .CC^ 
CAACAA^S ACCCAAA^ AAGATxS CTXTAAG^ C^oS t^C^SZ 
GATTCTTGTC AACAAtS TTAGCAA^T TAGTCC^ TAAc Cg ca|° *?? 
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CrC ctg tct tat ggc tat gat gag aaa tca acc gga gga a*. -» ^ : 

gl; » t y ; G i; ^ ^ G i: ^ ^ ^ ^ ^ n. *i ?» 

Sr 108 
CTC CCT GGC CCC CCT GG? GCA OCT GGT 



63 . 72 ^_. rrT( ^ rfr . cctSS? GCA CCT 05 

CCC ATG GGT CCC TCT GGT CCT CGT GGT ~ 



Pro ks' 



« civ m Ser Gly Pro Arg ay Leu Pro Gly Pro Pro. Gly .Mi Pro Gly 



153 152 



,n 126 13S 1« 

?ro Gift Gly Pte Glr. Gly Pro Pro Gly Glu Pro Gly Giu Pro Gly AU ser Gly 

171 180 189 198 207 216 

CCC ATG'gGT CCC CGA GGT CCC CCA GGT CCC CCT GGA ASG AAT ^ ». G,., w« 

Pro ifec Gly Pro Ar^ Pr0 Pro Gly Pro Pro Giy Lys Asr. Giy As? As? Gly 

22S 234 243 252 231 270 

GAA GCT GGA AAA CCT GGT CGT CCT GGT G?G CST GG5 CCT CCT GG3 CC? CAG GGT 

Glu Ala Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly 

279 2e8 297 306 315 324, 

GCT CGA GGA'tTG CCC GGA ACS. GCT GGC CTC CCT GGA ATS ASG CSC SGA GGT 



Ala. As; Gly Leu Pro Gly Thr Ala. Gly Leu Pro Giy Hsc Lys Giy His Arc Gly 

333 342 351 350 365 378 

TTC AGT GGT TTC- GAT GGT CCC AAG GGA GAT GCT GGT CCT GCT GGT CCT SAG GGT 

5 he Ser Giy Leu As? Gly Ala Lys Gly Asp Ala Giy Pro Ala Gly Pro Lys Gly 

387 396 4C3 414 423 432 

GAG CCT GGC AGC CCT GGT GAA AAT GGA GCT CCT GGT CAG ATG GGC CCC CGT GGC 

Giu Pro Gly Ser Pro Gly Glu Asn Gly Ala Pro Giy Gin «a; Giv Pro Arc Giy 

441 450 453 463 477 455 

CTG CCT GGT GAG AGA GGT CGC CCT GGA GCC CC? GGC CCT GCT GGT GCT CGT GGA 

leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Giy Pro Ala Giy Ala Arc Giy 

495 • 504 513 522 531 540 

AAT GAT GGT GCT ACT GGT GCT GCC G3G CCC CCT GGT CCC SCC GGC CCC GCT GGT 

As.-. As? Gly Ala Tfcr Gly Ala Ala Gly Pro Pre Giy Pro Thr Giy Pro Ala Gly 

549 558 557 576 liz 594 

CCT CCT GGC TTC CCT GGT GCT GTT GGT GCT AAG GGT GAA GCT GGT CCC CAA GGG 



Pro Pro Gly Phe Pro Gly Ala Val- Gly Ala Lys Gly Glu Ala Giy Pro Gin Giy 

603 612 621 630 639 648 

CCC CGA GGC TCT GAA GGT CCC CAG GST GIG CGT GGT GAG CCT GGC CCC CCT GGC 

Pro Arg Gly Ser Glu Gly Pro Gin Gly Vai Arc Gly Giu ?r= Giy Pro Pro Gly 

657 . 665 575 534 553 702 

CCT GCT GGT GCT GCT GGC CCT GCT GGA AftC CCT GGT GCT GAT GGA CAG CCT GGT 

Pro Aia GIv Ala Ala Giv Pro Ala Gly Asa Pre Giv Ala Ass Giv Sir. Pro Giv 
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• 7U 720 '729 733 7<7 755 

GCT AAA GST GCC AVT GET GCT CCT GST ATT CC? GST GCT OCT GGC TTC CCT C-GT 

Aia Lys Gly Ala Asr. Gly Ala Pro Gly lie Ala Giy Ala Pro Gly ?he Pro C-ly 

765 77? 783 792 801 810 

GCC CGA GGC OX TCT GGA CCC OG GGC CCC GGC GGC CCT OCT GGT CCC AAG GGT 

Ala Arg Gly Pro Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly 

819 828 837 846 855 864. 

ARC AGC GGT GAA CCT GG? GCT CCT GGC AGC AAA GGA GAC ACT GGT GCT AAG GGA 

Asn Ser Gly Glu Pro Gly Al2 Pro Gly Ser Lys Giy As? Thr Gly Ala Lys Giy 

873 882 891 900 509 918 

GAG CCT GGC OCT GTT GGT GTT CAA GGA CCC CCT GGC CC7 GCT GGA GAS GAA GGA 

Glu Pro Gly Pro Vai Giy Val Gin Gly Pro Pro Gly Pro Ala Gly Glu Giu Gly 

927 S36 945 954 963 972 

AAG CGA GGA GCT CGA GGT GAA. CCC GGA C0C ACT GGC CTG CCC GGA CCC CCT GGC 

Lys Arg Gly Ala Arc Gly Glu Pro Gly Pro Thr Gly Leu Pro Gly Pro Pro Gly 

981 • 950 999 1008 1017 1026 

GAG CGT GGT GGA CCT GGT AGC CGT GST TTC CCT GGC GCA GAT GST GTT GCT GGT 

Giu Arg Gly Gly Pro Giy Ser Arg Gly Pis Pro Gly Ala As? Giy Vai Ala Gly 

1035 10« 1053 1062 • 1071 1080 

CCC AAG GGT CCC GCT GGT GAA CGT GST TCT CCT GGC CCC GCT GGC COC AAA GGA 

Pro Lys Gly Pro Aia Giy Glu Arg Giy Ser Pro Giy Pro Ala Giy Pro Lys Giy 

10e9 1093 1107 1U5 1125 iijd 

3^ CCT GGT GAA CCT GGT CGT CCC GGT GAA GCT GGT CTG CCT GGT GCC AAG GGT 

Ser Pro Giy Giu Aia Giy Arg Pro Gly Gla Ala Giy Leu Pro Gly Ala Lys Giy 

1143 1152 1151 U70 1179 1188 

CTG ACT GGA AGC CCT GGC AGC CCT GGT CCT GAT GGC AAA ACT GGC CCC CCT GGT 



leu Thr Gly Ser Pro Gly Ser Pro Giy Pro As? Giy Lys Thr Gly Pro Pro Gly 

1197 1206 1215 1224 1233 1242 

?I ®* ®" CCC CCA GGC CCA CCT GGT GCC CGT GGT 

Pro Ala Gly Gin As?. Gly Arg Pro Gly Pre Pro Gly Pro Pro Gly Ala Arg Giy 

1251 1260 1269 1278 1287 1296 

CM* OC£ GGT GTG ATC-.GGA TTC OCT GGA CCT AAA GGT GCT GCT GGA GAG CCC. GGC 

Gin Ala Gly Val Mac Gly Phe Pro Gly Pro Lys Gly Aia Ala Giy Giu Pro Gly 

1305 1314 1323 1332 1341 1350 

AAG GCT GGA C?C CGA GGT GTT CCC GGA CCC CCT GGC GCT GTC GGT CCT GCT GGC 

Lys Ala Giy Giu Arc Gly Vai Pro Gly Pro Pro Gly Aia Val Gly Pro Ala Glv 
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1359 1368 1377 _1386 ^ ^ 
AAA GAT GGA GAG GCT GGA GCT CAG GGA CX CCT GX OCT . 

i;; a y au in ^ hi ai ~?* ^ a y ^ ay ^ ay 

1431 1440 1"9 1«3 

gag j« l S «* «x S ccr «« to: CCC GGA « CAG GGT CX XT ^ 
G Iu Hg ay Gl^ an Gly Pro lie ay Ser Pro ay ». a, Gly i*u Pro ay 

1485 1«94 1503 1S12 

CX CCt'gS XT XAxI GAA Oft AAA CCT GST GAA CAG GGT GTT CCT GGA 

HI Hy Pro Pro" ay Glu III ay Lys ?ro Gly Glu Gin Gly Vai Pro Gly 

1S2 , 1530 1539 1S<8 1557 «ff 

esc err gk gcc cct ggc ccc tct gga gca aga ggc g?g aga ^ 3^ ^ ^ 

tap -Zla Gly III Ho Gly Pro" Ser ay Ala Arg Gly Glu Arg ay Phe Pro ay 

1575 1S84 1593 1G02 1611 1620 

KG CST GST GIG CAA GGT CCC OCT GST CO GCT GGA OX OSA GSG GX MC GGT 

Glu Irs Gly vll Gin Gly Pro Pro ay Pro Ala Gly Pro Arg Gly Ale Asn ay 

1629 1638 1647 1656 1665 1674 

GCT CCC GX AAC GAT GGT GCT AAG XT GAT GCT GGT GCC CCT GGA GC7 CCC GST 

HI Pro ay Asp. As? Gly AU Lys ay As? Ala Giy Ala Pro ay Ala Pro Gly 

1533 1692 1701 1710 1719. 1728 

. CAG GGC GCC CCT GGC CTT CAG GGA ATG CCT GGT GAA CGT GGT GCA GCT GGT 

Ser Gir. Gly Ala Pro Gly leu Gin ay Me: Pro Gly Glu Arg Giy Ala Ala. Giy 

1737 1745 17S5 1764 1773 1782 

CTT CCA GGG CCT AAG GGT G.-C *GA GGT GAT GCT GGT CCC AAA GGT GCT GAT GGC 

Leu Pro ay Pro !ys ay As? Arg ay As? Si* Giy Pro Lys ay Ala Asp Gly 

1791 1800 1809 1818 1827 1335 

XT XT GGC. AAA GVT GGC GTC CGT GGT CTG £CC GGC CCC ATT GGT CCT CCT GGC 



Ser Pro ay Lys As? ay Vai Arg ay Leu Thr Gly Pro lie Giy Pro Pro Giy 

1845 1854 1863 1872 18ei 1850 

CCT GCT GGT GCC CCT GGT GAC AAG GGT GAA AGT GGT CCC AGC GGC CCT GCT GGT 

Pro Ala ay Ala Pro ay As? Lys ay Glu Ser Gly Pro Ser ay Pro Ala ay 

185S 1908 ■ 1917 1926 1933 1944 

XC AC? GGA GCT CGT GGT GCC CCC GGA GAC CGT GGT G?£ CCT GGT CCC CCC GGC 



Pro Thr ay Ala Arg Gly Ala Pro ay As? Arg ay Glu Pro Gly Pro Pro Gly 

1953 1962 1971 1980 1989 1998 

CCT GCT GGC TTT GCT GGC OX CCT GGT GCT GAC GGC CAA CCT GGT GCT AAA GGC 

Pro Ala Gly Phe Ala Gly Pro Pro ay Ala As? Gly Gin Pro Gly Ala Lys Giy 

2007 2015 2025 2034 2043 2052 

GAA CCT GGT GAT GCT GGT GX AAA GX GAT GCT GGT XC CCT GGS CCT GX GGA 

Glu Pro Gly As? Ala Gly Ala Lys Gly As? Ala Giy Prs Pro Giy Pro Ala Giy 
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2358 




2367 




2376 


CCT GGT 


ACT 
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CCT 
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Ser Pro Gly Ala As? Gly Pro AU Gly Ala Pro Giy Thr Pro Gly Pro Gin Gly 

2385 • 2394 2403 2412 . 2421 2430 

ATT GCT GGA OG CGT GGT GTG GTC GGC CTG CCT GGT CAG AGA GGA GAG AGA GGC 

lie Ala Gly Gin Arc Giy Vai Val Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly 

2439 244S 2457 2466 2475 2484 

TTC CCT GST CTT CCT GGC CCC CCT GGT GAA CCT GGC AAA CIA GST CCC TCT GGA 

Phe Pro Gly Leu Pro Gly Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Giy 

2493 2502 2511 2S20 2S29 . 2S38 

GCA AGT GGT GAA CGT GGT CCC CCC GGT CCC A3G GGC CCC CCT GGA TTG GCT GGA 

Ala Ser Gly Glu Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly 

2547 2555 2565 2574 2533 2S92 

CCC OCT GGT GAA TCT GGA CGT GAS GGS GCT OCT GCT GCC GAA GST TCC CCT GGA 

Pro Pro Gly Glu Ser Giy Arg Glu Gly Ala Pro Ala Ala Glu Gly Ser Pro Giy 

2601 2610 2619 2628 2637 2646 

CGA GAC GGT TCT CCT GGC GCC AAG GGT GAC CGT GGT GAG ACC GGC CCC GCT GGA 

Arg Asp Gly Ser Pro Gly Ala Lys Gly As? Arg Gly Glu Thr Gly Pro Ala Gly 

2655 2664 2673 2682 2691 2700 

CCC CCT GGT GCT CCT GGT GCT CCT GGT GCC CCT GGC CCC GTT GGC CCT GCT GGC 

Pro Pro Gly Ala Pro Giy Ala Pro Gly Ala Pro Gly Pro Val Gly Pro Ala Gly 
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2709 2718 2727 2736 27-S5 Z75i 

AAG AGT GCT CAT CGT GGT G.-J3 ACT GGT CCT CCT GGT CCC GCC GGT CCC GTC GGC 

Lys Ser Gly As? Arc Giy Giu Thr Giy Pro Aia Gly Pro Ala Giy Pro Val Gly 

2763 2772 2731 27SO 2799 2308 

CCC GCT GCC GCC CGT GCC CCC GCC GGA CCC OA GGC CCC CGT GGT GAC AftG GGT 

Pro Ala Giy Ala Arg Giy Pro Aia Giy Pro Gin Giy Pro Arc Giy As? Lys Gly 

2817 282S 283S 2844 2855 2862 

GAG ACA GGC GAA CAG GGC GAC AGA GGC ATA AAG GGT CSC CGT GGC TTC TCT GGC 

Giu Thr Gly Giu Gla C-ly As? Arg Giy He lys Giy His Ar= Giy Phe Ser Giy 

2871 2S80 2e89 2898 2907 2916 

CTC C.-G GGT CCC CCT GGC CCT CCT GGC TCT OCT GGT GAA CSA GGT CCC TCT GGA 

>j Gin Gly Pre Pro Giy Pro Pro Giy Ser Pro Gly Giu Gir. Giy Pro Ser Giy 

292S 2934 2943 2952 2961 29.70 

GCC TCT GGT CCT GCT GGT CCC CGA GGT CCC CCT GGC TCT GCT GGT GCT CCT GGC 

Ala Ser Giy Pro Ale Gly Pro Arg Gly Pro Pro Gly Ser Ala Giy Aia Pro Giy 

297S 25c8 2997 3006 3015 3024 

AAA GAT GGA CTC A.-C GGT CTC CCT GGC CCC ATT GGG CCC CCT GGT CCT CGC GGT 

Lys As? Giy Leu Asr. Giy Leu Pro Giy Pro Lie Giy Pro Pro Gly Pro Arc Gly 

3033 3042 3051 3060 30SS 3078 

CGC ACT GST GAT GCT GGT CCT GTT GGT. CCC CCC GGC OCT CCT GGA CCT CCT GGT 

Ar= Thr Giy As? Ala Giy Pro Vai Giy Pro Pro Gly Pro Pro Gly Pro Pre Giy 

3087 30S5 3105 3114 3123 3132 

CCC CCT GGT CCT CCC AGC GCT GGT TTC G-C TTC AGC TTC CTC CCC CAG CCA CCT 

Pro Pro Giy Pre Pro Ser Ala Giy Phe As? Phe Ser Phe Leu Pro Gin Pro Pro 

3141 3150 3159 3163 

CAA GAG AAG GCT CSC GAT GGT GGC CGC TAC TAC CGG GCT 3' 

Gir. Giu Lys Ala His As? Gly Giy Arc Tyr Tyr Arg Ala 



FIG. 27E 



194 



EP 0 992 586 A2 




195 



EP 0 992 586 A2 




196 



EP 0 992 586 A2 





HCol 


ColtCol 


Proline 






CGU 


139 


11 


CCC 


93 


12 


CCA 


6 


27 


CCG 


0 


189 


Glycine 






GGU 


174 


147 


GGC 


97 


179 


GGA 


64 


8 


GGG 


11 


12 



FIG. 30 



197 



EP 0 992 586 A2 



[Hyp], mM 



O CD O O O CD 




FIG. 31 



198 



EP 0 992 586 A2 



[NaCI], mM 



IPTG 



o o o 
■ o o o 

"~ T CM K> 



o cd 

CD CD 



CD CD CD CD 
CD O CD CD 



lO CO I — 00 O) 



O 
00 



<D 
CO 

o 



E E 



CD 
CD 



CJ> 



CD 
CD 
CO 




FIG. 32 



199 



EP 0 992 586 A2 




200 



EP 0 992 586 A2 



tn 

Q_ 

a> 



Temperature 



o 

o 



o 

o 

CM 



o 

to 

CM 



o 

o 




o 




FIG. 34 



201 



EP 0 992 586 A2 




FIG. 35 



202 



EP 0 992 586 A2 




FIG. 36 



203 



EP 0 992 586 A2 




204 



EP 0 992 586 A2 




FIG. 38 



205 



EP 0 992 586 A2 



.9 18 27 36 45 54 

CAG CTG AGC TAT GGC TAT GAT GAA AAA AGC ACC GGC GGC ATC AGC GTG CCG GGC 

Gin Leu Sc-r Tyr Gly 7yr As? Glu .Lys Ser Thr Gly Gly Xle Ser Val Pro Gly 

63 72 81 50 99 108 

CCG ATG GGT CCG AGC GGC CCG CGT GGC CTG COG GGC CCG CC* GGT GCG GCC GGT 

Pro Kec Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro Gly Ala Pro Gly 

U7 126 135 144 ' 1S3 162 

CCG CAG GGC TTT CAG GGT CCG CCG GGC GAA CCG GGC GAA OCT GGT GCG AGC GGC 

Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly Glu Pro Gly Ala Ser Gly 

17i. 180 189 193 207 216 

CCG ATG GGC CCG CGC GGC CCG CCG GGT COG CCA GGC AAA ASC GGC GAT GAT GGC 

pro Kec Gly Pro Arg Gly Pro Pro Gly Pro Pro Gly Lys Asn Gly As? Asp Gly 

22S 234 243 252 261 270 

GAA GCG GGC A«A CCG GEA CGT CCG GST GAA CGT GGC CCC CCG GGC CCG CAG GGC 

Glu Ala Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly 

279 , 288 297 305 315 324 

GCG CGC GGA CTG CCG GGT ACT GCG GGA CTG CCG GGC ATG AAA GGC CAC CGC GGT 

Ala Arg Gly Leu Pro Gly Thr Ala Gly Leu Pro Gly KSt Lys Gly Kis Arg Gly 

333 342 351 360 363 378 

TTC TCT GGT CTG GAT G3T GCC AAA GGA GAC GCG GGT CCG GCG GGT CCG AAA GGT 

Phe Ser Gly Leu Asp Gly Ala Lys Gly As? Ala Gly Pro Ala Gly Pro Lys Gly 

387 395 40S 414 423 432 

GAG CCG GGC AGC CCG GGC GAA AAC GGC GCG CCG GGT CAG ATG GGC CCG CGT GGC 

Glu Pro Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin *3ac Gly Pro Arg Gly 

441 450 459 463 477 486 

CTG OCT GGT GAA CGC GGT CGC CCG GGC GCC CCG GGC CCA GCT GGC GCA CGT GGC 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly 

495 504 513 522 531 540 

AAC GAT GGT GCG AGC GGT GCG GCC GGT CCA CCG GGC CCG iCG GGC CCG GOG GGT 

Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr Gly Pro Ala Gly 

549 558 567 575 585 594 

CCC CCG GGC TTT CCG GGT GCG GTG GGT GCG AAA GGC GAA GCA GGT CCG CAG GGG 

Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly Glu Ala Gly Pro Gin Gly 

603 612 621 630 639 648 

CCG CGC GGG AGC GAG GGT CCT CAG GGC GTT CGT GGT GAA CCG GGC CCG CCG GGC 

Pro Arg Gly Ser Glu Gly Pro Gin Gly Val Arg Gly Glu Pro Gly Pro Pro Gly 

6S7 666 675 684 693 702 

CCG GCG GGT GCG GCG GGC CCG GCT GGT AAC OCT GGC GCG GAC GGT CAG CCA GGT 

Pro Ala Gly Ala Ala Gly Pro Ala Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly 
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"20 "9 733 711 _« 

a "G AAA GCC AAC GGC GCG CCG BI BT » BT « K « ^ « « 

£ Lys cly to ^ Gly A^a Pro «i U. Gly Ala Pro Gly ? he Pro Gly 

785 792 801 810 

GCC CGC GGC CCG TCC GGC CCG CAG G3C CCG GGC CGC CCG CCC GGC CCG AAA GGG 

'pi: ^ ciy ;» ^ ^ *~ ^ > ro Giy ay ero ?ro ay ?5 ° Lys ay 

619 



E28 837 846 855 864 



Asn Ser Gly Glu Pro cly All Pro Gly Ser Lys Gly As? T>_- Gly .Ma Lys Gly 

„, =52 6S1 900 903 918 

GAA CCG GGC CCA GIG GST GTT CAA GEC CCG CCG GGC COG GCG GGC GAG GAA GGC 

Glu Pro Gly P» vll Gly Vai Gin Gly Pre Pro Gly Pro Ala Gly C-lu Gin Gly 

927 536 545 954 963 972 

Lys Arg Gly AU Arg Giy Glu Pro Giy Pre Thr Gly Leu Pro Gly Pro Pre Gly 

c B i • =50 599 1003 1017 1026 

CAA CGT GGT GGC CCG GGT AGO CGC GGT TTT CCG GGC GCG GAT GGT GIG GCG GGC 

Glu Arg Giy Gly Pro Sly Ser Arg Gly Pfcs Pro Gly .Ala As? Gly Val Ala Gly 

'03S i:-4< 1053 10S2 1071 1080 

CCG AAA GST CCG GCG 3GT GAA CGT GST AGC CCG GGC CCG GCG GGC CCA AAA GGC 

Pro Lys Giy Pro Ala Giy Gin Arg Gly Ser Pro Giy Pro Ala Gly Pro Lys Gly 

1089 1C-98 1107 . 1115 1125 1134 

AGC CCG GGC GAG GCA SGA CGT CCG GGT GAA GCG GGT CTC CCG GGC GCC AAA GGT 



Ser Pro Gly G1-- Ala Giy Arg Pro Giy Glu Ala Gly Leu Pro Gly Ala Lys Gly 

1143 1152 1161 1170 1179 1188 

CTG ACC GGC TCT CCG GGC AGC CCG GST CCS GAT GGC AAA ACG GGC CCG CCT GGT 

Leu Thr Giy Ser Pro Gly Ser Pro Gly Pro Asp Gly Lys Chr Gly Pro Pro Gly • 

1197 1206 1215 1224 "1233 1242 

CCG GCC GGC CAG GAT GGT CGC CCG GGC CCS CCG GGC CCG CCG GGT GCC CGT GGT 



Pro Ala Gly Gin Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly 

1251 12S0 .1269 1278 1287 1296 

CAG GCG GGT GTC ATG GGC TTT CCA GGC CCC AAA GGT GCG GCG GGT GAA CCG GGC 

Gin Ala Gly Vei Met Gly Phe Pro Gly Pro Lys Giy Ala Ala Gly Glu Pro Gly 

1305 1314 1323 1332 1341 1350 

AAA GCG GGC GAA CGC GGT GTC CCG GGT CCG CCG GGC GCT GTC GGG CCG GCG GGC 



Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Giy Ala Val Gly Pro Ala Gly 

1359 1368 1377 13SS 1395 1404 

AAA GAT GGC GAA GCG GGC GCG CAA GGC CCG CCG GGA CCA GCG GGT CCG GCG GGC 

Lys Asp Giy Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly 
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>431 1440 1«« 14S3 

« «'£ « « S CCS «■« ««««««««« 

oi: w a, a;; « ^ <=* * « *> a ° civ -~ ! " ay 

naic 1485 1494 1S03 1512 

z: £ ;» 5; ^ <ay ^ ^ ^ « a, 

^ ctg'Xc ax c^S ccg agc'S 00= «« « ogc'oS rxc ccg'ggc 

Ha aC Ma p"ro G~ly p"ro Se: Gly Ala Arc Gly Glu Arg Gly Ph. Pro Gly 

• isfl4 1^593 1602 1611 1620 

GAA CGT^GGT GTG CAG GGC CCG CCC GGC 00G GC7 GGT CC3 CGC GGC GCC AAC GGC 

Glu Arg Gly Val Gin Gly Pro Pro Gly Pro AU Gly Pro Arg Gly Ala Asn Gly 

1629 1638 1647 i6S6 1 665 1674 

GCG CCG GGC AAC GAT GGT GCG AAA GGT GAT GCG GGT GCC CCA GGT GCG CCG GGC 

All Pro Gly ten As? Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly 

168 3 •' 1692 1701 1710 1719 1723 

AGC CAG GGC GCC CCG GGG CTG CSA GGC ATG CCS GGT GAA OGT GGT GCC GCG GGT 

Ser Gin Gly Ala Pro Gly Leu Glr. Gly Met Pr= ay. Glu Arg Gly Ala Ala Gly 

1737 1746 1755 1764 1773 1782 

CTA CCG GGT CCG AAA GGC GAC CGC GGT GAT GCC- GGT CCA AAA GGT GCG GAT GGC 

Leu Pro Gly Pro Lys Gly As? Ar= Gly As? Ala Gly Pro Lys Gly Ala As? Gly 

1791 1800 10)9 1818 1827 1836 

TCC OCT GGC AAA GAT GGC GTT CG? GGT CTG ACC GGC CCS ATC GGC CCG CCG GGC 



Ser Pro Gly Lys Asp Gly Vai Arg Gly Leu Tfcr Gly Pro lie Gly Pro Pro Gly 

1845 1854 . 1853 1872 1881 1890 

CCG GCA GGT GCC CCG GGT GAC AAA GST GAA AGC GGT CCS AGC GGC CCA GCG GGC 

Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly Pro Ser Gly Pro Ala Gly 

1899 1908 1917 1926 1935 1944 

CCCACT GGT GCG GGT GGT GCC CCG GGC GAC CGT GGT GAA CCG GGT CCS CCG GGC 

Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp Arg Gly Glu Pro Gly Pre Pro Gly 

• 1953 1962 1971 1980 1989 1998 

. CCG GCG GGC TTT GCG GGC CCG CCA GGC GCT GAC GGC CAG CCG GGT GCG AAA GGC 

Pro Ala Gly Phe Ala Gly Pro Pro Gly Ala As? Gly Gin Pro Gly Ala Lys Gly 

2007 2016 2025 2034 2043 2052 

GAA CCG GGG GAT GCG GGT GCC AAA GGC GAC GCG GGT CCG CCG GGC CCT GCC GGC 

Glu Pro Gly Asp Ala Gly Ala Lys Gly As? Als Gly Pro Pro Gly Pro Ala Gly 

2061 2070 2079 2088 2097 2106 

CCG GCG GGC CCG CCA GGC CCG ATT GGC AAC GT3 GGT GCG CCG GGT GCC AAA GGT 

Pro Ala Gly Pro Pro Gly Pro lie Gly Asn Val Gly Ala Pro Gly Ala Lys Gly 
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, 2124 21J3 21« 21S1 21sO 

CCG CGC G3C AGC GCT GGT CCG CCC GGC CCG ACC GOT TTC CCC GGT GCC GCG G5- 

l:: ci; ^ £ ^ ■:-"= G : y ^ ^ ^ ^ **» — * a 

■>ico 2178 2187 2196 22CS 22i4 

CGC GTG GOT CCG CCA S CCG AGC GGT AAC GCG GGC CCG CCG GGC CCS CCG GGC 

Arg Val Gly Pro ?» cly Pro Ser Gly Asn Ale Gly Pro Pro Gly Pro ?ro Gly 

2223 2232 2241 22S0 2259 2258 

ZZZJ ~- — •> »"--!r CCT GCG GGA 



XG GCG GGC AAA GAG GGC GGC AAA GGT CCG CGT GGT GAA ACC GGC CCT GCG G£A 
HI "ill Gly L« Glu Gly Gly Lys Gly Pro Arc Gly Giu Thr Gly Pro Ala Gly 



CCG 

Pro Ala Gly 

,277 2286 2295 2304 2313 2322 

CGT CCA GGT GAA GTG GGT CCG CCG GGC CCG CCG GGC CCG GCG G3C GAA AAA GGT 

Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pre Gly Pro Ala Gly Glu Lys Giy 

2331 2340 2349 2358 2367 237S 

AGC CCG GGT GCG GAT GGT CCC GCC GGT SOG CCA GGC ACG CCG GGT CCG CftA GGT 

Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala Pr; Gly Thr Pro Gly Pro Gin Gly 

2335 «' 2394 2403 2412 2421 2430 

ATC GCT GGC CAG CGT GGT GTC GTCGGGCTGCCGGGTCAGCCCGGC GAA CGC GGC 

lie Ala Gly Gir. Arg Gly Val Vsi Gly Leu Pr= Gly Gin Arg Gly Glu Arg Giy 

2i39 244S 2457 2455 2-i/S 2484 

TTT CCG GGT CTG CCG GGC CCG AGC GGT GAG CCC GGC AAA CAG GGT CCA TCT GGC 

Phe Pro Gly Leu Pro Gly Pro Ser Gly Glu Pr= Gly lys Gin Gly Pre Ser Gly 

2493 2502 2511 2520 2529 2S38 

GCG AGC GGT GAA CGT GGC CCG CCG GGT CCC ATG GGC CCG CCG GGT CTG GCG GGC 



Ala Ser Gly Glu Arg Giy Pro Pro Gly Pro Sfe: Gly Pro Pro Gly Leu Ala Gly 

2S47 25:5 2565 2574 2583 2592 

CCT CCG GGT GAA AGC GGT CGT GAA GGC GCG CCG GGT GCC GAA GGC AGC CCA GGC 

Pro Pro Gly Glu Ser Gly Arg Glu Gly Ala Prs Gly Ala Glu Gly Ser Pro Gly 

2601 2619 2619 2628 2637 2646 

CGC GAC GGT AGC COS GGC GCC AAA GGG GAT CGT GGT GAA AGC GGC CCG GCG GGC 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arc Gly Glu Thr Gly Pro Ala Gly 

•2655 2654 2673 2682 2691 2700 

CCC CCG GGT GCA CCG. GGC GCG CCG GGT GCC CCA GGC CCG GIG GGC CCG GCG GGC 

Pro Pro Gly Ala Pro Giy Ala Pro Gly Ala Prs Gly Pro Val Gly Pro Ala Gly 

2709 271E 2727 2735 2745. 2754 

AAA AGC GGT GAT CGT GGT GAG ACC GGT CCG GCG GGC CCG GCC GGT CCG GIG GGC 

Lys Ser Gly As? Arg Giy Glu Thr Gly Pro Ala Gly Pro Ala Gly pro Val Gly 

2763 27J2 2781 2790 2799 2808 

CCA GCG GGC GCC CGT GGC CCG GCC GGT CCG CAG GGC CCG CGG GGT GAC AW GGT 

Pro Ala Gly Als Arg Gly Pro Ala Gly Pro Gir. Gly Pro Arg Gly As? Lys Gly 
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